Fast approach to wrapping data in a struct/class - c++

EDIT: The main intent is to allow manipulating underlying data as part of an encapsulated struct as opposed to direct data manipulation.
Which of the following approaches is recommended when it comes to wrapping some data inside a struct:
Keep a pointer to the data within the struct:
new s(buf), which stores buf in a local field (s->buf = buf)
reinterpret_cast-ing a memory address to a struct:
reinterpret_cast<s*>(buf)
Use the new operator against the memory address where the data is located:
new(buf) s;
Here is a sample program for these approaches:
#include <iostream>
using namespace std;
struct s {
int* i;
s(int* buf) : i(buf) {}
int getValue() { return *i * 2; }
};
struct s2 {
int i;
int getValue() { return i * 2; }
};
int main() {
int buf = 10;
s a(&buf);
cout << "value: " << a.getValue() << ", size: " << sizeof(a) << ", address: " << &a << ", buf-address: " << &buf << endl;
s2* a2 = new(&buf) s2;
cout << "value: " << a2->getValue() << ", size: " << sizeof(*a2) << ", address: " << a2 << ", buf-address: " << &buf << endl;
s2* a3 = reinterpret_cast<s2*>(&buf);
cout << "value: " << a3->getValue() << ", size: " << sizeof(*a3) << ", address: " << a3 << ", buf-address: " << &buf << endl;
}
And the output:
value: 20, size: 4, address: 0027F958, buf-address: 0027F964
value: 20, size: 4, address: 0027F964, buf-address: 0027F964
value: 20, size: 4, address: 0027F964, buf-address: 0027F964
Both size & time are important. Also, maintainability is important, e.g. someone might add by mistake a virtual function to s2 (which will mess up the data alignment).
Thanks!

None of those are even remotely good ideas, although the first one is passable with some modifications. reinterpret_cast doesn't work the way you think it does, and I'm not sure what exactly you're trying to achieve with placement new. Store a smart pointer of some sort in the first one to avoid the obvious issues with lifetime, and the first option isn't bad.
There is a fourth option: just store the data in a struct, and provide whatever encapsulated access you desire.
struct data {
data(int i_) : i(i_) { }
int i;
};
struct s {
s(int i_) : i(i_) { }
data i;
};
Rereading your question, it appears as though maybe your intent is for this struct to be an internal detail of some larger object. In that case, the lifetime issues with the first solution are likely taken care of, so storing a raw pointer is less of a bad idea. Absent additional details, though, I still recommend the fourth option.

Placement new will still call the constructor, wiping out anything that's in the buffer already if such a constructor exists (or is created unknowingly in the future) so I don't think that's a safe option. reinterpret_cast is undefined behavior even though it may appear to work for you. Storing a local pointer seems to be the best option although you've only given a very tiny inkling of what you're trying to do.
If you're attempting serialization here, remember important issues like sizeof(int) and endianness.

Using reinterpret_cast with anything other than char* is an undefined behavior. So 2/ is obviously out.
1 and 3 are OK but 1/ is the most straightforward.

You may prefer to encasulate data, you may want to use a (void*) pointer to a struct or class, these allows type encapsulation, and allow to extend the data in your struct, in next versions of your code:
struct HiddenCode{
int Field1;
char Field2;
};
void transferData(void* anyptr)
{
// this method "knows" that "anyptr" is a "HiddenCode*"
HiddenCode* MyHiddenCode = (void*) anyptr;
// do something else
}
void main()
{
HiddenCode* MyHiddenCode = new HiddenCode();
MyHiddenCode->Field1 = 5;
MyHiddenCode->Field2 = '1';
void* anyptr = (void*)MyHiddenCode;
transferData(anyptr);
}
Cheers.

Related

Changing the address of a pointer after construction

I've tried to create a minimal example of my problem. I'm trying to check if the address of an void pointer is NULL or not. The address should be overgiven by constructing the class, and should be const. I wrote the class below.
MyPointer.h:
public:
MyPointer(const void* activeApp) : m_activeApp(activeApp){
std::cout <<"adresse on construction: " << m_activeApp << std::endl;
};
MyPointer.cpp:
void printAdress();
private:
const void* m_activeApp;
};
The Methode "printAdress" should be able to print the correct address of the given pointer.
int main(void){
void* p_activeApp = nullptr;
std::cout << p_activeApp << std::endl;
MyPointer myPointer(p_activeApp);
std::cout << "Should be 0: " ;
myPointer.printAddress();
p_activeApp = new(bool);
std::cout << "Should be anything: ";
myPointer.printAddress();
}
void MyPointer::printAddress() {
std::cout << this->m_activeApp << std::endl;
};
Of course it doesn't work, because the m_activeApp still points to NULL, but how can I change this?
If you want to change what myPointer.m_activeApp points to, you have to set the pointer to a different value, simple as that. This pointer and p_activeApp are two distinct, independent pointers. Changing one does not change the other.
What you can do is to make one pointer a reference to the other pointer instead. Then, changing one would also change the other. This will work, though be warned, it won't be good programming style.
Pointers are integers (representing a memory location), so imagine void* to be a number. You set p_activeApp to 0, then you construct a MyPointer, and its internal pointer is set to 0.
Then you change the first number by using the new operator, p_activeApp now gets a value (to that new memory address), but there's no reason for m_activeApp to also change. It still points at 0, no matter what p_activeApp changes to.
I fixed it like this:
public:
MyPointer(void** activeApp) : m_pp_activeApp(activeApp) {
std::cout <<"address on construction: " << *m_pp_activeApp << std::endl;
};
void printAddress();
private:
void** m_pp_activeApp;
};
int main(void){
void* p_activeApp = nullptr;
void** pp_activeApp = &p_activeApp;
MyPointer myPointer(pp_activeApp);
std::cout << "Should be 0: " ;
myPointer.printAddress();
p_activeApp = new(bool);
std::cout << "Should be anything: ";
myPointer.printAddress();
}
void MyPointer::printAddress() {
std::cout << *m_pp_activeApp << std::endl;
};

Placement new and aligning for possible offset memory

I've been reading up on placement new, and I'm not sure if I'm "getting" it fully or not when it comes to proper alignment.
I've written the following test program to attempt to allocate some memory to an aligned spot:
#include <iostream>
#include <cstdint>
using namespace std;
unsigned char* mem = nullptr;
struct A
{
double d;
char c[5];
};
struct B
{
float f;
int a;
char c[2];
double d;
};
void InitMemory()
{
mem = new unsigned char[1024];
}
int main() {
// your code goes here
InitMemory();
//512 byte blocks to write structs A and B to, purposefully misaligned
unsigned char* memoryBlockForStructA = mem + 1;
unsigned char* memoryBlockForStructB = mem + 512;
unsigned char* firstAInMemory = (unsigned char*)(uintptr_t(memoryBlockForStructA) + uintptr_t(alignof(A) - 1) & ~uintptr_t(alignof(A) - 1));
A* firstA = new(firstAInMemory) A();
A* secondA = new(firstA + 1) A();
A* thirdA = new(firstA + 2) A();
cout << "Alignment of A Block: " << endl;
cout << "Memory Start: " << (void*)&(*memoryBlockForStructA) << endl;
cout << "Starting Address of firstA: " << (void*)&(*firstA) << endl;
cout << "Starting Address of secondA: " << (void*)&(*secondA) << endl;
cout << "Starting Address of thirdA: " << (void*)&(*thirdA) << endl;
cout << "Sizeof(A): " << sizeof(A) << endl << "Alignof(A): " << alignof(A) << endl;
return 0;
}
Output:
Alignment of A Block:
Memory Start: 0x563fe1239c21
Starting Address of firstA: 0x563fe1239c28
Starting Address of secondA: 0x563fe1239c38
Starting Address of thirdA: 0x563fe1239c48
Sizeof(A): 16
Alignof(A): 8
The output appears to be valid, but I still have some questions about it.
Some questions I have are:
Will fourthA, fifthA, etc... all be aligned as well?
Is there a simpler way of finding a properly aligned memory location?
In the case of struct B, it is set up to not be memory friendly. Do I need to reconstruct it so that the largest members are at the top of the struct, and the smallest members are at the bottom? Or will the compiler automatically pad everything so that it's member d will not be malaligned?
Will fourthA, fifthA, etc... all be aligned as well?
yes if the alignement of a type is a multiple of the size
witch is (i think) always the case
Is there a simpler way of finding a properly aligned memory location?
yes
http://en.cppreference.com/w/cpp/language/alignas
or
http://en.cppreference.com/w/cpp/memory/align
as Dan M said.
In the case of struct B, it is set up to not be memory friendly. Do I need to reconstruct it so that the largest members are at the top of the struct, and the smallest members are at the bottom? Or will the compiler automatically pad everything so that it's member d will not be malaligned?
you should reorganize if you think about it.
i don't think compiler will reorganize element in a struct for you.
because often when interpreting raw data (coming from file, network ...) this data is often just interpreted as a struct and 2 compiler reorganizing differently could break code.
I hope my explanation are clear and that I did not make any mistakes

Converting char* to int after using strdup()

Why after using strdup(value) (int)value returns you different output than before?
How to get the same output?
My short example went bad, please use the long one:
Here the full code for tests:
#include <stdio.h>
#include <iostream>
int main()
{
//The First Part
char *c = "ARD-642564";
char *ca = "ARD-642564";
std::cout << c << std::endl;
std::cout << ca << std::endl;
//c and ca are equal
std::cout << (int)c << std::endl;
std::cout << (int)ca << std::endl;
//The Second Part
c = strdup("ARD-642564");
ca = strdup("ARD-642564");
std::cout << c << std::endl;
std::cout << ca << std::endl;
//c and ca are NOT equal Why?
std::cout << (int)c << std::endl;
std::cout << (int)ca << std::endl;
int x;
std::cin >> x;
}
Because an array decays to a pointer in your case, you are printing a pointer (ie, on non-exotic computers, a memory address). There is no guarantee that a pointer fits in an int.
In the first part of your code, c and ca don't have to be equal. Your compiler performs a sort of memory optimization (see here for a full answer).
In the second part, strdup allocates dynamically a string twice, such that the returned pointers are not equal. The compiler does not optimize these calls because he does not seem to control the definition of strdup.
In both cases, c and ca may not be equal.
"The strdup() function shall return a pointer to a new string, which is a duplicate of the string pointed to by s1." source
So it's quite understandable that the pointers differ.

Correctly Deal With Byte Alignment Issues -- Between 16 Bit Embeded System and 32 Bit Desktop via UDP

The application I am working on receives C style structs from an embed system whose code was generated to target a 16 bit processor. The application which speaks with the embedded system is built with either a 32 bit gcc compiler, or a 32 bit MSVC c++ compiler. The communication between the application and the embedded system takes place via UDP packets over ethernet or modem.
The payload within the UDP packets consist of various different C style structs. On the application side a C++ style reinterpret_cast is capable of taking the unsigned byte array and casting it into the appropriate struct.
However, I run into problems with reinterpret_cast when the struct contains enumerated values. The 16 bit Watcom compiler will treat enumerated values as an uint8_t type. However, on the application side the enumerated values are treated as 32 bit values. When I receive a packet with enumerated values in it the data gets garbled because the size of the struct on the application side is larger the struct on the embedded side.
The solution to this problem, so far, has been to change the enumerated type within the struct on the application side to an uint8_t. However, this is not an optimal solution because we can no longer use the member as an enumerated type.
What I am looking for is a solution which will allow me to use a simple cast operation without having to tamper with the struct definition in the source on the application side. By doing so, I can use the struct as is in the upper layers of my application.
As noted, correctly deal with the issue is proper serialization and deserialization.
But it doesn't mean we can't try some hacks.
Option 1:
If you particular compiler support packing the enum (in my case gcc 4.7 in windows), this might work:
typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 }__attribute__ ((__packed__)) TheRealEnum;
Option 2:
If your particular compiler supports class sizes of < 4 bytes, you can use a HackedEnum class which uses operator overloading for the conversion (note the gcc attribute you might not want it):
class HackedEnum
{
private:
uint8_t evalue;
public:
void operator=(const TheRealEnum v) { evalue = v; };
operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));
You would replace TheRealEnum in your structures for HackedEnum, but you still continue using it as TheRealEnum.
A full example to see it working:
#include <iostream>
#include <stddef.h>
using namespace std;
#pragma pack(push, 1)
typedef enum { VALUE_1 = 1, VALUE_2, VALUE_3 } TheRealEnum;
typedef struct
{
uint16_t v1;
uint8_t enumValue;
uint16_t v2;
}__attribute__((packed)) ShortStruct;
typedef struct
{
uint16_t v1;
TheRealEnum enumValue;
uint16_t v2;
}__attribute__((packed)) LongStruct;
class HackedEnum
{
private:
uint8_t evalue;
public:
void operator=(const TheRealEnum v) { evalue = v; };
operator TheRealEnum() { return (TheRealEnum)evalue; };
}__attribute__((packed));
typedef struct
{
uint16_t v1;
HackedEnum enumValue;
uint16_t v2;
}__attribute__((packed)) HackedStruct;
#pragma pop()
int main(int argc, char **argv)
{
cout << "Sizes: " << endl
<< "TheRealEnum: " << sizeof(TheRealEnum) << endl
<< "ShortStruct: " << sizeof(ShortStruct) << endl
<< "LongStruct: " << sizeof(LongStruct) << endl
<< "HackedStruct: " << sizeof(HackedStruct) << endl;
ShortStruct ss;
cout << "address of ss: " << &ss << " size " << sizeof(ss) <<endl
<< "address of ss.v1: " << (void*)&ss.v1 << endl
<< "address of ss.ev: " << (void*)&ss.enumValue << endl
<< "address of ss.v2: " << (void*)&ss.v2 << endl;
LongStruct ls;
cout << "address of ls: " << &ls << " size " << sizeof(ls) <<endl
<< "address of ls.v1: " << (void*)&ls.v1 << endl
<< "address of ls.ev: " << (void*)&ls.enumValue << endl
<< "address of ls.v2: " << (void*)&ls.v2 << endl;
HackedStruct hs;
cout << "address of hs: " << &hs << " size " << sizeof(hs) <<endl
<< "address of hs.v1: " << (void*)&hs.v1 << endl
<< "address of hs.ev: " << (void*)&hs.enumValue << endl
<< "address of hs.v2: " << (void*)&hs.v2 << endl;
uint8_t buffer[512] = {0};
ShortStruct * short_ptr = (ShortStruct*)buffer;
LongStruct * long_ptr = (LongStruct*)buffer;
HackedStruct * hacked_ptr = (HackedStruct*)buffer;
short_ptr->v1 = 1;
short_ptr->enumValue = VALUE_2;
short_ptr->v2 = 3;
cout << "Values of short: " << endl
<< "v1 = " << short_ptr->v1 << endl
<< "ev = " << (int)short_ptr->enumValue << endl
<< "v2 = " << short_ptr->v2 << endl;
cout << "Values of long: " << endl
<< "v1 = " << long_ptr->v1 << endl
<< "ev = " << long_ptr->enumValue << endl
<< "v2 = " << long_ptr->v2 << endl;
cout << "Values of hacked: " << endl
<< "v1 = " << hacked_ptr->v1 << endl
<< "ev = " << hacked_ptr->enumValue << endl
<< "v2 = " << hacked_ptr->v2 << endl;
HackedStruct hs1, hs2;
// hs1.enumValue = 1; // error, the value is not the wanted enum
hs1.enumValue = VALUE_1;
int a = hs1.enumValue;
TheRealEnum b = hs1.enumValue;
hs2.enumValue = hs1.enumValue;
return 0;
}
The output on my particular system is:
Sizes:
TheRealEnum: 4
ShortStruct: 5
LongStruct: 8
HackedStruct: 5
address of ss: 0x22ff17 size 5
address of ss.v1: 0x22ff17
address of ss.ev: 0x22ff19
address of ss.v2: 0x22ff1a
address of ls: 0x22ff0f size 8
address of ls.v1: 0x22ff0f
address of ls.ev: 0x22ff11
address of ls.v2: 0x22ff15
address of hs: 0x22ff0a size 5
address of hs.v1: 0x22ff0a
address of hs.ev: 0x22ff0c
address of hs.v2: 0x22ff0d
Values of short:
v1 = 1
ev = 2
v2 = 3
Values of long:
v1 = 1
ev = 770
v2 = 0
Values of hacked:
v1 = 1
ev = 2
v2 = 3
On the application side a C++ style reinterpret_cast is capable of taking the unsigned byte array and casting it into the appropriate struct.
The layout of structs is not required to be the same between different implementations. Using reinterpret_cast in this way is not appropriate.
The 16 bit Watcom compiler will treat enumerated values as an uint8_t type. However, on the application side the enumerated values are treated as 32 bit values.
The underlying type of an enum is chosen by the implementation, and is chosen in an implementation defined manner.
This is just one of the many potential differences between implementations that can cause problems with your reinterpret_cast. There are also actual alignment issues if you're not careful, where the data in the received buffer isn't appropriately aligned for the types (e.g., an integer that requires four byte alignment ends up one byte off) which can cause crashes or poor performance. Padding might be different between platforms, fundamental types might have different sizes, endianess can differ, etc.
What I am looking for is a solution which will allow me to use a simple cast operation without having to tamper with the struct definition in the source on the application side. By doing so, I can use the struct as is in the upper layers of my application.
C++11 introduces a new enum syntax that allows you to specify the underlying type. Or you can replace your enums with integral types along with a bunch of predefined constants with manually declared values. This only fixes the problem you're asking about and not any of the other ones you have.
What you should really do is proper serialization and deserialization.
Put your enumerated type inside of a union with a 32-bit number:
union
{
Enumerated val;
uint32_t valAsUint32;
};
This would make the embedded side have it expanded to 32-bit. Should work as long as both platforms are little-endian and the structs are zero-filled initially. This would change wire format, though.
If by "simple cast operation" you mean something that's expressed in the source code, rather than something that's necessarily zero-copy, then you can write two versions of the struct -- one with enums, one with uint8_ts, and a constructor for one from the other that copies it element-by-element to repack it. Then you can use an ordinary type-cast in the rest of the code. Since the data sizes are fundamentally different (unless you use the C++11 features mentioned in another answer), you can't do this without copying things to repack them.
However, if you don't mind some small changes to the struct definition on the application side, there are a couple of options that don't involve dealing with bare uint8_t values. You could use aaronps's answer of a class that is the size of a uint8_t (assuming that's possible with your compiler) and implicitly converts to and from an enum. Alternately, you could store the values as uint8_ts and write some accessor methods for your enum values that take the uint8_t data in the struct and convert it to an enum before returning it.

TOUGH: Dealing with deeply nested pointers in C++

I define this structure:
struct s_molecule
{
std::string res_name;
std::vector<t_particle> my_particles;
std::vector<t_bond> my_bonds;
std::vector<t_angle> my_angles;
std::vector<t_dihedral> my_dihedrals;
s_molecule& operator=(const s_molecule &to_assign)
{
res_name = to_assign.res_name;
my_particles = to_assign.my_particles;
my_bonds = to_assign.my_bonds;
my_angles = to_assign.my_angles;
my_dihedrals = to_assign.my_dihedrals;
return *this;
}
};
and these structures:
typedef struct s_particle
{
t_coordinates position;
double charge;
double mass;
std::string name;
std::vector<t_lj_param>::iterator my_particle_kind_iter;
s_particle& operator=(const s_particle &to_assign)
{
position = to_assign.position;
charge = to_assign.charge;
mass = to_assign.mass;
name = to_assign.name;
my_particle_kind_iter = to_assign.my_particle_kind_iter;
return *this;
}
} t_particle;
struct s_bond
{
t_particle * particle_1;
t_particle * particle_2;
std::vector<t_bond_param>::iterator my_bond_kind_iter;
s_bond& operator=(const s_bond &to_assign)
{
particle_1 = to_assign.particle_1;
particle_2 = to_assign.particle_2;
my_bond_kind_iter = to_assign.my_bond_kind_iter;
return *this;
}
};
and then in my code I return a pointer to an s_molecule (typedef'd to t_molecule, but still).
Using this pointer I can get this code to work:
for (unsigned int i = 0;
i < current_molecule->my_particles.size();
i++)
{
std::cout << "Particle "
<< current_molecule->my_particles[i].name << std::endl
<< "Charge: "
<< current_molecule->my_particles[i].charge << std::endl
<< "Mass: "
<< current_molecule->my_particles[i].mass << std::endl
<< "Particle Kind Name: "
<< (*current_molecule->my_particles[i].my_particle_kind_iter).atom_kind_name
<< std::endl
<< "x: " << current_molecule->my_particles[i].position.x
<< " y: " << current_molecule->my_particles[i].position.y
#ifdef USE_3D_GEOM
<< "z: " << current_molecule->my_particles[i].position.z
#endif
<< std::endl;
}
If I replace it with:
for (std::vector<t_particle>::iterator it = current_molecule->my_particles.begin();
it !=current_molecule->my_particles.end();
it++)
{
std::cout << "Particle "
<< (*it).name << std::endl
<< "Charge: "
<< (*it).charge << std::endl
<< "Mass: "
<< (*it).mass << std::endl
<< "Particle Kind Name: "
<< (*(*it).my_particle_kind_iter).atom_kind_name
<< std::endl
<< "x: " << (*it).position.x
<< " y: " << (*it).position.y
#ifdef USE_3D_GEOM
<< "z: " << (*it).position.z
#endif
<< std::endl;
}
I now get nasty segfaults...
Not to put too much here, but I'm also getting segfaults when I tried to do this:
std::cout << "Bond ATOMS : "
<< (*current_molecule).my_bonds[0].particle_1->name
<< std::endl
Again, current_molecule is a pointer to a s_molecule structure, which contains arrays of structures, which in turn either directly have vars or are pointers. I can't get these multiple layers of indirection to work. Suggestions on fixing these segfaults.
FYI I'm compiling on Linux Centos 5.4 with g++ and using a custom makefile system.
#sbi Thanks for the good advice! I believe you are right -- the assignment overloaded operator is unnecessary and should be scrapped.
I've followed the approach of commenting out stuff and am very confused. Basically in the function that passes the pointer to my particular molecule to the main function to print, I can see all the data in that molecule (bonds, particles, name, etc) perfectly, printing with cout's.
Once I pass it to the main as a ptr, if I use that ptr with an iterator I get a segfault. In other words. Also for some reason the bond data (which I can freely print in my funct that returns to the pointer) also segfaults if I try to print it, even if I use the [] to index the vector of bonds (which works for the particle vector).
That's the best info I can give for now.
A wild guess: Are you using shared libraries. I remember having difficulties passing STL-containers back and forth across shared library boundaries.
Jason (OP) was asked in a comment by David Rodríguez:
Are you returning a pointer to a local variable?
Jason answered:
No its a ptr to a class variable. The class is very much in existence (it contains the function that returns the molecule).
Unless you're talking of a true class variable (qualified as static), the fact that the class exists doesn't have much to do with it. Instances of a class exist, and they might have ceased to exist even if you just called a function on them.
As such, the question is:
Does the instance of the class that returned the pointer current_molecule still exist?
Or is current_molecule qualified as static, i.e. being a true class variable?
If the answer to both questions is "no", you're in Undefined County.
At this point, it becomes very important that you post source code that can be used by us here to actually reproduce the problem; it might well be located in source you aren't showing us.
Again, this issue was answered here:
Weird Pointer issue in C++
by DeadMG. Sorry for the double post.