I am a beginner and doing my first project. I cannot understand this line:
(reinterpret_cast<char*>(this), sizeof(*this))
I assume this is in the context of passing these two values as arguments to a function, certainly in the context of a class. Maybe you're just seeing this all around and wondering why it keeps coming up...
This is a relatively common pattern when serializing* an instance of a class. Let's break it down into the two parts (aka "arguments").
reinterpret_cast<char*>(this)
reinterpret_cast<A>(B) means whatever B is, actually treat it as A
char* is a pointer to a char
char is basically just a byte
this is a "pointer" to the current class, of whatever type.
A "pointer" to something literally is the location in memory of the first byte that represents that something
So we're getting a pointer to the first byte, in memory, that represents this's class.
sizeof(*this)
sizeof(A) gets the number of bytes it takes to hold whatever type A is
*this gets the value of the pointer
this (again) is a "pointer" to the current class, of whatever type.
So we get the number of bytes, in memory, that are used to represent this's class.
Why?
Serialization
There are lots of APIs in C/C++ that need to use a block of bytes, in particular when interfacing with things outside of pure software world, storing or transmitting blocks of data, which very often needs a way to "serialize" the data in question.
The location and the length/size (the arguments to the function) are the minimum bits of information needed to describe a block of memory, aka a block of bytes. You'll see this kind of thing when you're serializing a class.
Example class that might use this
These days it's common to need to send "pixel" data out over a serial line to control "addressable" LEDs. It might be helpful to do something like this:
// Some function that sends the raw bytes on the wire to the WS2812 pixels
extern void sendBytes(char const * bytes, size_t num);
struct Pixel {
// WS2812 uses GRB order instead of RGB :(
uint8_t g;
uint8_t r;
uint8_t b;
void display() {
sendBytes(reinterpret_cast<char*>(this), sizeof(*this));
}
// Other functions
};
Related
I am writing in c++ for the Nintendo DS (With 4MB of RAM). I have a button class that stores data like the x,y location and length. Which of the following would take less memory?
.
Method 1, class variables length, x, y, and halfPoint
Button::Button(int setX, int setY, int setLength)
{
x = setX;
y = setY;
length = setLength;
halfPoint = length/2;
}
//access variable with buttonName.halfPoint
Method 2, class variables length, x and y
Button::Button(int setX, int setY, int length)
{
x = setX;
y = setY;
length = setLength;
}
int Button::getHalfPoint()
{
return length/2;
}
//access variable with buttonName.getHalfPoint()
Any help is appreciated. (And in the real code I calculate a location much more complex than the half point)
The getHalfPoint() method will take up less room if there are a lot of Buttons. Why? Because member functions are actually just implemented by the compiler as regular functions with an implied first argument of a pointer to the object. So your function is rewritten by the compiler as:
int getHalfPoint(Button* this)
{
return this->length/2;
}
(It is a bit more complicated, because of name mangling, but this will do for an explanation.)
You should carefully consider the extra amount of computation that will have to be done to avoid storing 4 extra bytes, however. And as Cameron mentions, the compiler might add extra space to the object anyway, depending upon the architecture (I think that is likely to happen with RISC architectures).
Well, that depends!
The method code exists exactly once in memory, but a member variable exists once for each object instance.
So you'll have to count the number of instances you create (multiplied by the sizeof the variable), and compare that to the size of the compiled method (using a tool like e.g. objdump).
You'll also want to compare the size of your Button with and without the extra variable, because it's entirely possible that the compiler pads it to the same length anyway.
I suggest you declare the getHalfPoint method inside your class. This will make the compiler inline the code.
There is a possibility that the code in the function is one assembly instruction, and depending on your platform, take the size of 4 bytes or less. In this case, there is probably no benefit to have a variable represent the half of another variable. Research "right shifting". Also, to take full advantage, make the variable unsigned int. (Right shifting a signed integer is not defined.)
The inline capability means that the content of the function will be pasted wherever there is a call to the function. This reduces the overhead of a function call (such as the branch instruction, pushing and popping arguments). The reduction of a branch instruction may even speed up the program because there is no flushing of the instruction cache or pipeline.
Here's my issue: I need to pass back two uint32_t's via a single uint32_t (because of how the API is set up...). I can hard code whatever other values I need to reverse the operation, but the parameter passed between functions needs to stay a single uint32_t.
This would be trivial if I could just bit-shift the two 32-bit ints into a single 64-bit int (like what was explained here), but the compiler wouldn't like that. I've also seen mathematical pairing functions, but I'm not sure if that's what I need in this case.
I've thought of setting up a simple cipher: the unint32_t could be the cipher text, and I could just hard code the key. This is an example, but that seems like overkill.
Is this even possible?
It is not possible to store more than 32 bits of information using only 32 bits. This is a basic result of information theory.
If you know that you're only using the low-order 16 bits of each value, you could shift one left 16 bits and combine them that way. But there's absolutely no way to get 64 bits worth of information (or even 33 bits) into 32 bits, period.
Depending on how much trouble this is really worth, you could:
create a global array or vector of std::pair<uint32_t,uint32_t>
pass an index into the function, then your "reverse" function just looks up the result in the array.
write some code to decide which index to use when you have a pair to pass. The index needs to not be in use by anyone else, and since the array is global there may be thread-safety issues. Essentially what you are writing is a simple memory allocator.
As a special case, on a machine with 32 bit data pointers you could allocate the struct and reinterpret_cast the pointer to and from uint32_t. So you don't need any globals.
Beware that you need to know whether or not the function you pass the value into might store the value somewhere to be "decoded" later, in which case you have a more difficult resource-management problem than if the function is certain to have finished using it by the time it returns.
In the easy case, and if the code you're writing doesn't need to be re-entrant at all, then you only need to use one index at a time. That means you don't need an array, just one pair. You could pass 0 to the function regardless of the values, and have the decoder ignore its input and look in the global location.
If both special cases apply (32 bit and no retaining of the value), then you can put the pair on the stack, and use no globals and no dynamic allocation even if your code does need to be re-entrant.
None of this is really recommended, but it could solve the problem you have.
You can use an intermediate global data structure to store the pair of uint32_t on it, using your only uint32_t parameter as the index on the structure:
struct my_pair {
uint32_t a, b;
};
std::map<uint32_t, my_pair> global_pair_map;
uint32_t register_new_pair(uint32_t a, uint32_t b) {
// Add the pair of (a, b) to the map global_pair_map on a new key, and return the
// new key value.
}
void release_pair(uint32_t key) {
// Remove the key from the global_pair_map.
}
void callback(uint32_t user_data) {
my_pair& p = global_pair_map[user_data];
// Use your pair of uint32_t with p.a, and p.b.
}
void main() {
uint32_t key = register_new_pair(number1, number2);
register_callback(callback, key);
}
I'm writing an application and I had to do some pointers arithmetic. However this application will be running on different architecture! I was not really sure if this would be problematic but after reading this article, I thought that I must change it.
Here was my original code that I didn't like much:
class Frame{
/* ... */
protected:
const u_char* const m_pLayerHeader; // Where header of this layer starts
int m_iHeaderLength; // Length of the header of this layer
int m_iFrameLength; // Header + payloads length
};
/**
* Get the pointer to the payload of the current layer
* #return A pointer to the payload of the current layer
*/
const u_char* Frame::getPayload() const
{
// FIXME : Pointer arithmetic, portability!
return m_pLayerHeader + m_iHeaderLength;
}
Pretty bad isn't it! Adding an int value to a u_char pointer! But then I changed to this:
const u_char* Frame::getPayload() const
{
return &m_pLayerHeader[m_iHeaderLength];
}
I think now, the compiler is able to say how much to jump! Right? Is the operation [] on array considered as pointer arithmetic? Does it fix the portability problem?
p + i and &p[i] are synonyms when p is a pointer and i a value of integral type. So much that you can even write &i[p] and it's still valid (just as you can write i + p).
The portability issue in the example you link was coming from sizeof(int) varying across platforms. Your code is just fine, assuming m_iHeaderLength is the number of u_chars you want to skip.
In your code you are advancing the m_pLayerHeader by m_iHeaderLength u_chars. As long as whatever wrote the data you are pointing into has the same size for u_char, and i_HeaderLength is the number of u_chars in the header area you are safe.
But if m_iHeaderLength is really referring to bytes, and not u_chars, then you may have a problem if m_iHeaderLength is supposed to advance the pointer past other types than char.
Say you are sending data from a 16-bit system to a 32-bit system, your header area is defined like this
struct Header {
int something;
int somethingElse;
};
Assume that is only part of the total message defined by the struct Frame.
On the 32-bit machine you write the data out to a port that the 16-bit machine will read from.
port->write(myPacket, sizeof(Frame));
On the 16-bit machine you have the same Header definition, and try to read the information.
port->read(packetBuffer, sizeof(Frame));
You are already in trouble because you've tried to read twice the amount of data the sender wrote. The size of int on the 16-bit machine doing the reading is two, and the size of the header is four. But the header size was eight on the sending machine, two ints of four bytes each.
Now you attempt to advance your pointer
m_iHeaderLength = sizeof(Header);
...
packetBuffer += m_iHeaderLength;
packetBuffer will still be pointing into the data which was in the header in the frame sent from the originator.
If there is a portability problem, then no, that wouldn't fix it. m_pLayerHeader + m_iHeaderLength and &m_pLayerHeader[m_iHeaderLength] are completely equivalent (in this case).
I'm porting an application from 32 bit to 64 bit.
It is C style coding (legacy product) although it is C++. I have an issue where a combination of union and struct are used to store values. Here a custom datatype called "Any" is used that should hold data of any basic datatype. The implementation of Any is as follows:
typedef struct typedvalue
{
long data; // to hold all other types of 4 bytes or less
short id; // this tells what type "data" is holding
short sign; // this differentiates the double value from the rest
}typedvalue;
typedef union Any
{
double any_any;
double any_double; // to hold double value
typedvalue any_typedvalue;
}Any;
The union is of size 8 bytes. They have used union so that at a given time there will only be one value and they have used struct to differentiate the type. You can store a double, long, string, char, float and int values at any given time. Thats the idea.
If its a double value, the value is stored in any_double. if its any other type, then its stored in "data" and the type of the value is stored in the "id". The "sign" would tell if value "Any" is holding a double or another type.
any_any is used liberally in the code to copy the value in the address space irrespective of the type. (This is our biggest problem since we do not know at a given time what it will hold!)
If its a string or pointer "Any" is suppose to hold, it is stored in "data" (which is of type long). In 64 bit, here is where the problem lies. pointers are 8 bytes. So we will need to change the "long" to an equivalent 8 byte (long long). But then that would increase the size of the union to 16 bytes and the liberal usage of "any_any" will cause problems. There are too many usage of "any_any" and you are never sure what it can hold.
I already tried these steps and it turned unsuccessful:
1. Changed the "long data" to "long long data" in the struct, this will make the size of the union to 16 bytes. - This will not allow the data to be passed as "any_any" (8 bytes).
2. Declared the struct as a pointer inside union. And changed the "long data" to "long long data" inside struct. - the issue encountered here was that, since its a pointer we need to allocate memory for the struct. The liberal use of "any_any" makes it difficult for us to allocate memory. Sometimes we might overwrite the memory and hence erase the value.
3. Create a separate collection that will hold the value for "data" (a key value pair). - This will not work because this implementation is at the core of application, the collection will run into millions of data.
Can anybody help me in this?
"Can anybody help me" this sounds like a cry of desperation, and I totally understand it.
Whoever wrote this code had absolutely no respect for future-proofing, or of portability, and now you're paying the price.
(Let this be a lesson to anyone who says "but our platform is 32bit! we will never use 64bit!")
I know you're going to say "but the codebase is too big", but you are better off rewriting the product. And do it properly this time!
Ignoring that fact that the original design is insane, you could use <stdint.h> (or soon <cstdint> to get a little bit of predictability:
struct typedvalue
{
uint16_t id;
uint16_t sign;
uint32_t data;
};
union any
{
char any_raw[8];
double any_double
typedvalue any_typedvalue;
};
You're still not guaranteed that typedvalue will be tightly packed, since there are no alignment guarantees for non-char members. You could make a struct Foo { char x[8]; }; and type-pun your way around, like *(uint32_t*)(&Foo.x[0]) and *(uint16_t*)(&Foo.x[4]) if you must, but that too would be extremely ugly.
If you are in C++0x, I would definitely throw in a static assertion somewhere for sizeof(typedvalue) == sizeof(double).
If you need to store both an 8 byte pointer and a "type" field then you have no choice but to use at least 9 bytes, and on a 64-bit system alignment will likely pad that out to 16 bytes.
Your data structure should look something like:
typedef struct {
union {
void *any_pointer;
double any_double;
long any_long;
int any_int;
} any;
char my_type;
} any;
If using C++0x consider using a strongly typed enumeration for the my_type field. In earlier versions the storage required for an enum is implementation dependent and likely to be more than one byte.
To save memory you could use (compiler specific) directives to request optimal packing of the data structure, but the resulting mis-aligned memory accesses may cause performance issues.
This is related to my question asked here today on SO. Is there a better way to build a packet to send over serial rather than doing this:
unsigned char buff[255];
buff[0] = 0x02
buff[1] = 0x01
buff[2] = 0x03
WriteFile(.., buff,3, &dwBytesWrite,..);
Note: I have about twenty commands to send, so if there was a better way to send these bytes to the serial device in a more concise manner rather than having to specify each byte, it would be great. Each byte is hexadecimal, with the last byte being the checksum. I should clarify that I know I will have to specify each byte to build the commands, but is there a better way than having to specify each array position?
You can initialize static buffers like so:
const unsigned char command[] = {0x13, 0x37, 0xf0, 0x0d};
You could even use these to initialize non-const buffers and then replace only changing bytes by index.
Not sure what you're asking. If you ask about the problem of setting the byte one by one and messing up the data, usually this is doen with a packed struct with members having meaningful names. Like:
#pragma push(pack)
#pragma pack(1)
struct FooHeader {
uint someField;
byte someFlag;
dword someStatus;
};
#pragma pack(pop)
FooHeader hdr;
hdr.someField = 2;
hdr.someFlag = 3;
hdr.someStatus = 4;
WriteFile(..., sizeof(hdr), &hdr);
Is there a better way to build a packet than assembling it byte by byte?
Yes, but it will require some thought and some careful engineering. Many of the other answers tell you other mechanisms by which you can put together a sequence of bytes in C++. But I suggest you design an abstraction that represents a part of a packet:
class PacketField {
void add_to_packet(Packet p);
};
Then you can define various subclasses:
Add a single byte to the packet
Add a 16-bit integer in big-endian order. Another for little-endian. Other widths besides 16.
Add a string to the packet; code the string by inserting the length and then the bytes.
You also can define a higher-order version:
PacketField sequence(PacketField first, PacketField second);
Returns a field that consists of the two arguments in sequence. If you like operator overloading you could overload this as + or <<.
Your underlying Packet abstraction will just be an extensible sequence of bytes (dynamic array) with some kind of write method.
If you wind up programming a lot of network protocols, you'll find this sort of design pays off big time.
Edit: The point of the PacketField class is composability and reuse:
By composing packet fields you can create more complex packet fields. For example, you could define "add a TCP header" as a function from PacketFields to PacketFields.
With luck you build up a library of PacketFields that are specific to your application or protocol family or whatever. Then you reuse the fields in the library.
You can create subclasses of PacketField that take extra parameters.
It's quite possibly that you can do something equally nice without having to have this extra level of indirection; I'm recommending it because I've seen it used effectively in other applications. You are decoupling the knowledge of how to build a packet (which can be applied to any packet, any time) from the act of actually building a particular packet. Separating concerns like this can help reuse.
Yes, there is a better method. Have your classes read from and write to a packed buffer. You could even implement this as an interface. Templates would help to.
An example of writing:
template <typename Member_Type>
void Store_Value_In_Buffer(const Member_Type&, member,
unsigned char *& p_buffer)
{
*((Member_Type *)(p_buffer)) = member;
p_buffer += sizeof(Member_Type);
return;
}
struct My_Class
{
unsigned int datum;
void store_to_buffer(unsigned char *& p_buffer)
{
Store_Value_In_Buffer(datum, buffer);
return;
}
};
//...
unsigned char buffer[256];
unsigned char * p_buffer(buffer);
MyClass object;
object.datum = 5;
object.store_to_buffer(p_buffer);
std::cout.write(p_buffer, 256);
Part of the interface is also to query the objects for the size that they would occupy in the buffer, say a method size_in_buffer. This is left as an exercise for the reader. :-)
There is a much better way, which is using structs to set the structures. This is usually how network packets are built on a low level.
For example, say you have packets which have an id, length, flag byte, and data, you'd do something like this:
struct packet_header {
int id;
byte length;
byte flags;
};
byte my_packet[] = new byte[100];
packet_header *header = &my_packet;
header->id = 20;
header->length = 10; // This can be set automatically by a function, maybe?
// etc.
header++; // Header now points to the data section.
Do note that you're going to have to make sure that the structures are "packed", i.e. when you write byte length, it really takes up a byte. Usually, you'd achieve this using something like #pragma pack or similar (you'll have to read about your compiler's pragma settings).
Also, note that you should probably use functions to do common operations. For example, create a function which gets as input the size, data to send, and other information, and fills out the packet header and data for you. This way, you can perform calculations about the actual size you want to write in the length field, you can calculate the CRC inside the function, etc.
Edit: This is a C-centric way of doing things, which is the style of a lot of networking code. A more C++-centric (object oriented) approach could also work, but I'm less familiar with them.
const char *c = "\x02\x02\x03";