Unspecified number of variables within a struct - c++

I'm writing a simple program using structures. The structure I have a problem with is supposed be flexible when it comes to the number of variables within the definition of a said structure. The program is to download the number from a file with .txt extension:
ints: 5
strings: 6
then create a defined number of ints and strings in the structure. At first I was thinking about using pointers within the struct:
struct Data
{
int* intArr;
string* stringArr;
};
But after a quick search it turned out to be an ineffective way to go about the problem. The biggest challenge is... that I cannot use STL.
Could you please point me in the right direction?
Hardly am I an experienced programmer so I apologise if the question is inaccurate or just plainly wrong.

void* is convertible to any data type, you can do void *var = new char and is valid, but void has not size, so it is useless to allocate different types together, because you'll have to know the type in order to dereference the pointer (and/or the length to iterate over it), but you don't know it, unless you track all the types, but as you can suppose, this is really ineffective.
If you want to allocate memory for different datatypes, do it in the right way, that is, a pointer for each type:
int * foo;
foo = new int[N];
char * bar;
bar = new bar[N];
Regarding the ineffectiveness, bear in mind that you have two ways, calculate the worst case scenario and reserve the memory at compilation time (i.e. char charArray[50]; int intArray[200]), or use new/delete to allocate memory based on the runtime requirements (which is what stl is going to do under the hood, unless you use std::array)

I did not understood your problem correctly but what you have mentioned it seems you want to bind variable number of information to a structure.
Use double pointer as variable in the structure to allocate address of array or pointers. This should solve your problem of variable number of members.
struct X
{
int **ptr;
};
X a;
a.ptr = malloc(sizeof(int*)*10);
This should create 10 pointers and allocate address of created pointer to structure member variable. This value 10 can vary at run time.

Related

How to get an array of an Element from an array of structs

I have an array of a struct, lets say
struct cell{
int pos;
int id;
};
std::vector<cell> myArray;
I want an array of the id element. I can't just iterate over my array as it would take too long.
I have to provide std::vector<int> to a function.
My thought process was: Since arrays are usually just a pointer to the first element and then an offset I thought of creating an array where i can provide the offset, such as it would point to the id element of the next cell in std::vector<cell> myArray.
One solution I can think of is having an array of pointers to that element, for example:
The final solution might be something like:
struct cell{
int pos;
int id;
};
std::vector<cell> myArray;
std::vector<int*> pointersToIds;
// Creating an array of int from an array of int*
std::vector<int> idsArray = std::something(pointersToIds);
myFunc(idsArray);
Since the std library has tons of stuff I supposed there would be a way to do this.
Is there a way to convert the array of pointers to an actual array of elements in a very optimized way? The pointers approach was the only i could think but it's doesn't necessarily have to be it.
Thank you all in advance :)
I tried iterating over the the array of pointers and creating an array of elements, but it would take too much time.
TLDR Get array of an element from array of struct
I suppose this might be an instance of The XY Problem, since it's not clear what you are actually trying to solve, do you:
Want to find a fast way to pass the list of struct to a function
Want a way to extract all the members from a list of struct into a list of members
First off, shoo away from your mind the idea of manually creating an array of addresses and then fiddling around with the offsets yourself, this is certainly doable, but probably hard to do yourself in a safe and portable way due to Struct Alignment, something that differs from machine to machine.
besides accessing cell.id is already doing that in a portable way by itself!
Problem 1.
If you want to pass a vector (or any object really) to a function in a fast way, you can use a reference, it would look something like this:
void foo(std::vector<cell>& in_vec);
notice the & operator, declaring that in_vec must be passed as a reference, what this does internally is pass in_vec by address, avoiding copying values one by one, C++ does all this by himself and you can treat in_vec normally in the function without a care in the world, and it's blazing fast.
Problem 2.
if your point is that you want to extract all the IDs before passing them to a function, first off, I still suggest you pass the cell, that way it is clear that foo is supposed to operate on cell IDs and not random integers, once again, paying the cost of unpacking the structs outside (which warrants an iteration) or inside (where you might not even need to access all cells depending on foo's nature) is equal if not worse.
If you must carry through, it's as easy as a for loop:
std::vector<int> ids;
for(auto const& cell : myArray)
{
ids.push_back(cell.id);
}
Or, if you want a elegant and modern solution, using lambdas and algorithm:
#include <algorithm>
std::vector<int> ids;
std::transform(myArray.begin(), myArray.end(),
std::back_inserter(ids), [](cell const& c) {
return c.id;
});
Or something to this effect.

Why is a variable length array not declared not as a pointer sometimes?

I see this in code sometimes:
struct S
{
int count; // length of array in data
int data[1];
};
Where the storage for S is allocated bigger than sizeof(S) so that data can have more space for its array. It is then used like:
S *s;
// allocation
s->data[3] = 1337;
My question is, why is data not a pointer? Why the length-1 array?
If you declare data as a pointer, you'll have to allocate a separate memory block for the data array, i.e. you'll have to make two allocations instead of one. While there won't be much difference in the actual functionality, it still might have some negative performance impact. It might increase memory fragmentation. It might result in struct memory being allocated "far away" from the data array memory, resulting in the poor cache behavior of the data structure. If you use your own memory management routines, like pooled allocators, you'll have to set up two allocators: one for the struct and one for the array.
By using the above technique (known as "struct hack") you allocate memory for the entire struct (including data array) in one block, with one call to malloc (or to your own allocator). This is what it is used for. Among other things it ensures that struct memory is located as close to the array memory as possible (i.e. it is just one continuous block), so the cache behavior of the data structure is optimal.
Raymond Chen wrote an excellent article on precisely why variable length structures chose this pattern over many others (including pointers).
http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx
He doesn't directly comment on why a pointer was chosen over an array but Steve Dispensa provides some insight in the comments section.
From Steve
typedef struct _TOKEN_GROUPS {
DWORD GroupCount;
SID_AND_ATTRIBUTES *Groups;
} TOKEN_GROUPS, *PTOKEN_GROUPS;
This would still force Groups to be pointer-aligned, but it's much less convenient when you think of argument marshalling.
In driver development, developers are sometimes faced with sending arguments from user-mode to kernel-mode via a METHOD_BUFFERED IOCTL. Structures with embedded pointers like this one represent anything from a security flaw waiting to happen to simply a PITA.
It's done to make it easier to manage the fact that the array is sequential in memory (within the struct). Otherwise, after the memalloc that is greater than sizeof(S), you would have to point 'data' at the next memory address.
Because it lets you have code do this:
struct S
{
int count; // length of array in data
int data[1];
};
struct S * foo;
foo = malloc(sizeof(struct S) + ((len - 1)*sizeof(int)) );
strcpy(foo->data, buf);
Which only requires one call to malloc and one call to free.
This is common enough that the C99 standard allows you do not even specify a length of the array. It's called a flexible array member.
From ISO/IEC 9899:1999, Section
6.7.2.1, paragraph 16: "As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member."
called a flexible array member."
struct S
{
int count; // length of array in data
int data[];
};
And gcc has allowed 0 length array members as the last members of structs as an extension for a while.
Because of different copy semantics. If it is a pointer inside, then the contents have to explicitly copied. If it is a C-style array inside, then the copy is automatic.
Incidentally, I don't think there's any guarantee that using a length-one array as something longer is going to work. A compiler would be free to generate effective-address code that relies upon the subscript being no larger than the specified bound (e.g. if an array bound is specified as one, a compiler could generate code that always accesses the first element, and if it's two, on some platforms, an optimizing compiler might turn a[i] into ((i & 1) ? a[1] : a[0]). Note that while I'm unaware of any compilers that actually do that transform, I am aware of platforms where it would be more efficient than computing an array subscript.
I think a standards-compliant approach would be to declare the array as [MAX_SIZE] and allocate sizeof(struct S)-(MAX_SIZE-len)*sizeof(int) bytes.

Uses for multiple levels of pointer dereferences?

When does using pointers in any language require someone to use more than one, let's say a triple pointer. When does it make sense to use a triple pointer instead of just using a regular pointer?
For example:
char * * *ptr;
instead of
char *ptr;
each star should be read as "which pointed to by a pointer" so
char *foo;
is "char which pointed to by a pointer foo". However
char *** foo;
is "char which pointed to by a pointer which is pointed to a pointer which is pointed to a pointer foo". Thus foo is a pointer. At that address is a second pointer. At the address pointed to by that is a third pointer. Dereferencing the third pointer results in a char. If that's all there is to it, its hard to make much of a case for that.
Its still possible to get some useful work done, though. Imagine we're writing a substitute for bash, or some other process control program. We want to manage our processes' invocations in an object oriented way...
struct invocation {
char* command; // command to invoke the subprocess
char* path; // path to executable
char** env; // environment variables passed to the subprocess
...
}
But we want to do something fancy. We want to have a way to browse all of the different sets of environment variables as seen by each subprocess. to do that, we gather each set of env members from the invocation instances into an array env_list and pass it to the function that deals with that:
void browse_env(size_t envc, char*** env_list);
If you work with "objects" in C, you probably have this:
struct customer {
char *name;
char *address;
int id;
} typedef Customer;
If you want to create an object, you would do something like this:
Customer *customer = malloc(sizeof Customer);
// Initialise state.
We're using a pointer to a struct here because struct arguments are passed by value and we need to work with one object. (Also: Objective-C, an object-oriented wrapper language for C, uses internally but visibly pointers to structs.)
If I need to store multiple objects, I use an array:
Customer **customers = malloc(sizeof(Customer *) * 10);
int customerCount = 0;
Since an array variable in C points to the first item, I use a pointer… again. Now I have double pointers.
But now imagine I have a function which filters the array and returns a new one. But imagine it can't via the return mechanism because it must return an error code—my function accesses a database. I need to do it through a by-reference argument. This is my function's signature:
int filterRegisteredCustomers(Customer **unfilteredCustomers, Customer ***filteredCustomers, int unfilteredCount, int *filteredCount);
The function takes an array of customers and returns a reference to an array of customers (which are pointers to a struct). It also takes the number of customers and returns the number of filtered customers (again, by-reference argument).
I can call it this way:
Customer **result, int n = 0;
int errorCode = filterRegisteredCustomers(customers, &result, customerCount, &n);
I could go on imagining more situations… This one is without the typedef:
int fetchCustomerMatrix(struct customer ****outMatrix, int *rows, int *columns);
Obviously, I would be a horrible and/or sadistic developer to leave it that way. So, using:
typedef Customer *CustomerArray;
typedef CustomerArray *CustomerMatrix;
I can just do this:
int fetchCustomerMatrix(CustomerMatrix *outMatrix, int *rows, int *columns);
If your app is used in a hotel where you use a matrix per level, you'll probably need an array to a matrix:
int fetchHotel(struct customer *****hotel, int *rows, int *columns, int *levels);
Or just this:
typedef CustomerMatrix *Hotel;
int fetchHotel(Hotel *hotel, int *rows, int *columns, int *levels);
Don't get me even started on an array of hotels:
int fetchHotels(struct customer ******hotels, int *rows, int *columns, int *levels, int *hotels);
…arranged in a matrix (some kind of large hotel corporation?):
int fetchHotelMatrix(struct customer *******hotelMatrix, int *rows, int *columns, int *levels, int *hotelRows, int *hotelColumns);
What I'm trying to say is that you can imagine crazy applications for multiple indirections. Just make sure you use typedef if multi-pointers are a good idea and you decide to use them.
(Does this post count as an application for a SevenStarDeveloper?)
A pointer is simply a variable that holds a memory address.
So you use a pointer to a pointer, when you want to hold the address of a pointer variable.
If you want to return a pointer, and you are already using the return variable for something, you will pass in the address of a pointer. The function then dereferences this pointer so it can set the pointer value. I.e. the parameter of that function would be a pointer to a pointer.
Multiple levels of indirection are also used for multi dimensional arrays. If you want to return a 2 dimensional array, you would use a triple pointer. When using them for multi dimensional arrays though be careful to cast properly as you go through each level of indirection.
Here is an example of returning a pointer value via a parameter:
//Not a very useful example, but shows what I mean...
bool getOffsetBy3Pointer(const char *pInput, char **pOutput)
{
*pOutput = pInput + 3;
return true;
}
And you call this function like so:
const char *p = "hi you";
char *pYou;
bool bSuccess = getOffsetBy3Pointer(p, &pYou);
assert(!stricmp(pYou, "you"));
ImageMagicks's Wand has a function that is declared as
WandExport char* * * * * * DrawGetVectorGraphics ( const DrawingWand *)
I am not making this up.
N-dimensional dynamically-allocated arrays, where N > 3, require three or more levels of indirection in C.
A standard use of double pointers, eg: myStruct** ptrptr, is as a pointer to a pointer. Eg as a function parameter, this allows you to change the actual structure the caller is pointing to, instead of only being able to change the values within that structure.
Char *** foo can be interpreted as a pointer to a two-dimensional array of strings.
You use an extra level of indirection - or pointing - when necessary, not because it would be fun. You seldom see triple pointers; I don't think I've ever seen a quadruple pointer (and my mind would boggle if I did).
State tables can be represented by a 2D array of an appropriate data type (pointers to a structure, for example). When I wrote some almost generic code to do state tables, I remember having one function that took a triple pointer - which represented a 2D array of pointers to structures. Ouch!
int main( int argc, char** argv );
Functions that encapsulate creation of resources often use double pointers. That is, you pass in the address of a pointer to a resource. The function can then create the resource in question, and set the pointer to point to it. This is only possible if it has the address of the pointer in question, so it must be a double pointer.
If you have to modify a pointer inside a function you must pass a reference to it.
It makes sense to use a pointer to a pointer whenever the pointer actually points towards a pointer (this chain is unlimited, hence "triple pointers" etc are possible).
The reason for creating such code is because you want the compiler/interpreter to be able to properly check the types you are using (prevent mystery bugs).
You dont have to use such types - you can always simply use a simple "void *" and typecast whenever you need to actually dereference the pointer and access the data that the pointer is directing towards. But that is usually bad practice and prone to errors - certainly there are cases where using void * is actually good and making code much more elegant. Think of it more like your last resort.
=> Its mostly for helping the compiler to make sure things are used the way they are supposed to be used.
To be honest, I've rarely seen a triple-pointer.
I glanced on google code search, and there are some examples, but not very illuminating. (see links at end - SO doesn't like 'em)
As others have mentioned, double pointers you'll see from time to time. Plain single pointers are useful because they point to some allocated resource. Double pointers are useful because you can pass them to a function and have the function fill in the "plain" pointer for you.
It sounds like maybe you need some explanation about what pointers are and how they work?
You need to understand that first, if you don't already.
But that's a separate question (:
http://www.google.com/codesearch/p?hl=en#e_ObwTAVPyo/security/nss/lib/ckfw/capi/ckcapi.h&q=***%20lang:c&l=301
http://www.google.com/codesearch/p?hl=en#eVvq2YWVpsY/openssl-0.9.8e/crypto/ec/ec_mult.c&q=***%20lang:c&l=344
Pointers to pointers are rarely used in C++. They primarily have two uses.
The first use is to pass an array. char**, for instance, is a pointer to pointer to char, which is often used to pass an array of strings. Pointers to arrays don't work for good reasons, but that's a different topic (see the comp.lang.c FAQ if you want to know more). In some rare cases, you may see a third * used for an array of arrays, but it's commonly more effective to store everything in one contiguous array and index it manually (e.g. array[x+y*width] rather than array[x][y]). In C++, however, this is far less common because of container classes.
The second use is to pass by reference. An int* parameter allows the function to modify the integer pointed to by the calling function, and is commonly used to provide multiple return values. This pattern of passing parameters by reference to allow multiple returns is still present in C++, but, like other uses of pass-by-reference, is generally superseded by the introduction of actual references. The other reason for pass-by-reference - avoiding copying of complex constructs - is also possible with the C++ reference.
C++ has a third factor which reduces the use of multiple pointers: it has string. A reference to string might take the type char** in C, so that the function can change the address of the string variable it's passed, but in C++, we usually see string& instead.
When you use nested dynamically allocated (or pointer linked) data structures. These things are all linked by pointers.
Particularly in single-threaded dialects of C which don't aggressively use type-based aliasing analysis, it can sometimes be useful to write memory managers which can accommodate relocatable objects. Instead of giving applications direct pointers to chunks of memory, the application receives pointers into a table of handle descriptors, each of which contains a pointer to an actual chunk of memory along with a word indicating its size. If one needs to allocate space for a struct woozle, one could say:
struct woozle **my_woozle = newHandle(sizeof struct woozle);
and then access (somewhat awkwardly in C syntax--the syntax is cleaner in
Pascal): (*my_woozle)->someField=23; it's important that applications not
keep direct pointers to any handle's target across calls to functions which
allocate memory, but if there only exists a single pointer to every block
identified by a handle the memory manager will be able to move things around
in case fragmentation would become a problem.
The approach doesn't work nearly as well in dialects of C which aggressively
pursue type-based aliasing, since the pointer returned by NewHandle doesn't
identify a pointer of type struct woozle* but instead identifies a pointer
of type void*, and even on platforms where those pointer types would have
the same representation the Standard doesn't require that implementations
interpret a pointer cast as an indication that it should expect that aliasing
might occur.
Double indirection simplifies many tree-balancing algorithms, where usually one wants to be able to efficiently "unlink" a subtree from its parent. For instance, an AVL tree implementation might use:
void rotateLeft(struct tree **tree) {
struct tree *t = *tree,
*r = t->right,
*rl = r->left;
*tree = r;
r->left = t;
t->right = rl;
}
Without the "double pointer", we would have to do something more complicated, like explicitly keeping track of a node's parent and whether it's a left or right branch.

How should I change this declaration?

I have been given a header with the following declaration:
//The index of 1 is used to make sure this is an array.
MyObject objs[1];
However, I need to make this array dynamically sized one the program is started. I would think I should just declare it as MyObject *objs;, but I figure if the original programmer declared it this way, there is some reason for it.
Is there anyway I can dynamically resize this? Or should I just change it to a pointer and then malloc() it?
Could I use some the new keyword somehow to do this?
Use an STL vector:
#include <vector>
std::vector<MyObject> objs(size);
A vector is a dynamic array and is a part of the Standard Template Library. It resizes automatically as you push back objects into the array and can be accessed like a normal C array with the [] operator. Also, &objs[0] is guaranteed to point to a contiguous sequence in memory -- unlike a list -- if the container is not empty.
You're correct. If you want to dynamically instantiate its size you need to use a pointer.
(Since you're using C++ why not use the new operator instead of malloc?)
MyObject* objs = new MyObject[size];
Or should I just change it to a
pointer and then malloc() it?
If you do that, how are constructors going to be called for the objects in on the malloc'd memory? I'll give you a hint - they won't be - you need to use a std::vector.
I have only seen an array used as a pointer inside a struct or union. This was ages ago and was used to treat the len and first char of a string as a hash to improve the speed of string comparisons for a scripting language.
The code was similar to this:
union small_string {
struct {
char len;
char buff[1];
};
short hash;
};
Then small_string was initialised using malloc, note the c cast is effectively a reinterpret_cast
small_string str = (small_string) malloc(len + 1);
strcpy(str.buff, val);
And to test for equality
int fast_str_equal(small_string str1, small_string str2)
{
if (str1.hash == str2.hash)
return strcmp(str1.buff, str2.buff) == 0;
return 0;
}
As you can see this is not a very portable or safe style of c++. But offered a great speed improvement for associative arrays indexed by short strings, which are the basis of most scripting languages.
I would probably avoid this style of c++ today.
Is this at the end of a struct somewhere?
One trick I've seen is to declare a struct
struct foo {
/* optional stuff here */
int arr[1];
}
and malloc more memory than sizeof (struct foo) so that arr becomes a variable-sized array.
This was fairly commonly used in C programs back when I was hacking C, since variable-sized arrays were not available, and doing an additional allocation was considered too error-prone.
The right thing to do, in almost all cases, is to change the array to an STL vector.
Using the STL is best if you want a dynamically sizing array, there are several options, one is std::vector. If you aren't bothered about inserting, you can also use std::list.
Its seems - yes, you can do this change.
But check your code on sizeof( objs );
MyObj *arr1 = new MyObj[1];
MyObj arr2[1];
sizeof(arr1) != sizeof(arr2)
Maybe this fact used somewhere in your code.
That comment is incredibly bad. A one-element array is an array even though the comment suggests otherwise.
I've never seen anybody try to enforce "is an array" this way. The array syntax is largely syntactic sugar (a[2] gives the same result as 2[a]: i.e., the third element in a (NOTE this is an interesting and valid syntax but usually a very bad form to use because you're going to confuse programmers for no reason)).
Because the array syntax is largely syntactic sugar, switching to a pointer makes sense as well. But if you're going to do that, then going with new[] makes more sense (because you get your constructors called for free), and going with std::vector makes even more sense (because you don't have to remember to call delete[] every place the array goes out of scope due to return, break, the end of statement, throwing an exception, etc.).

Managing C++ objects in a buffer, considering the alignment and memory layout assumptions

I am storing objects in a buffer. Now I know that I cannot make assumptions about the memory layout of the object.
If I know the overall size of the object, is it acceptible to create a pointer to this memory and call functions on it?
e.g. say I have the following class:
[int,int,int,int,char,padding*3bytes,unsigned short int*]
1)
if I know this class to be of size 24 and I know the address of where it starts in memory
whilst it is not safe to assume the memory layout is it acceptible to cast this to a pointer and call functions on this object which access these members?
(Does c++ know by some magic the correct position of a member?)
2)
If this is not safe/ok, is there any other way other than using a constructor which takes all of the arguments and pulling each argument out of the buffer one at a time?
Edit: Changed title to make it more appropriate to what I am asking.
You can create a constructor that takes all the members and assigns them, then use placement new.
class Foo
{
int a;int b;int c;int d;char e;unsigned short int*f;
public:
Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};
...
char *buf = new char[sizeof(Foo)]; //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);
This has the advantage that even the v-table will be generated correctly. Note, however, if you are using this for serialization, the unsigned short int pointer is not going to point at anything useful when you deserialize it, unless you are very careful to use some sort of method to convert pointers into offsets and then back again.
Individual methods on a this pointer are statically linked and are simply a direct call to the function with this being the first parameter before the explicit parameters.
Member variables are referenced using an offset from the this pointer. If an object is laid out like this:
0: vtable
4: a
8: b
12: c
etc...
a will be accessed by dereferencing this + 4 bytes.
Basically what you are proposing doing is reading in a bunch of (hopefully not random) bytes, casting them to a known object, and then calling a class method on that object. It might actually work, because those bytes are going to end up in the "this" pointer in that class method. But you're taking a real chance on things not being where the compiled code expects it to be. And unlike Java or C#, there is no real "runtime" to catch these sorts of problems, so at best you'll get a core dump, and at worse you'll get corrupted memory.
It sounds like you want a C++ version of Java's serialization/deserialization. There is probably a library out there to do that.
Non-virtual function calls are linked directly just like a C function. The object (this) pointer is passed as the first argument. No knowledge of the object layout is required to call the function.
It sounds like you're not storing the objects themselves in a buffer, but rather the data from which they're comprised.
If this data is in memory in the order the fields are defined within your class (with proper padding for the platform) and your type is a POD, then you can memcpy the data from the buffer to a pointer to your type (or possibly cast it, but beware, there are some platform-specific gotchas with casts to pointers of different types).
If your class is not a POD, then the in-memory layout of fields is not guaranteed, and you shouldn't rely on any observed ordering, as it is allowed to change on each recompile.
You can, however, initialize a non-POD with data from a POD.
As far as the addresses where non-virtual functions are located: they are statically linked at compile time to some location within your code segment that is the same for every instance of your type. Note that there is no "runtime" involved. When you write code like this:
class Foo{
int a;
int b;
public:
void DoSomething(int x);
};
void Foo::DoSomething(int x){a = x * 2; b = x + a;}
int main(){
Foo f;
f.DoSomething(42);
return 0;
}
the compiler generates code that does something like this:
function main:
allocate 8 bytes on stack for object "f"
call default initializer for class "Foo" (does nothing in this case)
push argument value 42 onto stack
push pointer to object "f" onto stack
make call to function Foo_i_DoSomething#4 (actual name is usually more complex)
load return value 0 into accumulator register
return to caller
function Foo_i_DoSomething#4 (located elsewhere in the code segment)
load "x" value from stack (pushed on by caller)
multiply by 2
load "this" pointer from stack (pushed on by caller)
calculate offset of field "a" within a Foo object
add calculated offset to this pointer, loaded in step 3
store product, calculated in step 2, to offset calculated in step 5
load "x" value from stack, again
load "this" pointer from stack, again
calculate offset of field "a" within a Foo object, again
add calculated offset to this pointer, loaded in step 8
load "a" value stored at offset,
add "a" value, loaded int step 12, to "x" value loaded in step 7
load "this" pointer from stack, again
calculate offset of field "b" within a Foo object
add calculated offset to this pointer, loaded in step 14
store sum, calculated in step 13, to offset calculated in step 16
return to caller
In other words, it would be more or less the same code as if you had written this (specifics, such as name of DoSomething function and method of passing this pointer are up to the compiler):
class Foo{
int a;
int b;
friend void Foo_DoSomething(Foo *f, int x);
};
void Foo_DoSomething(Foo *f, int x){
f->a = x * 2;
f->b = x + f->a;
}
int main(){
Foo f;
Foo_DoSomething(&f, 42);
return 0;
}
A object having POD type, in this case, is already created (Whether or not you call new. Allocating the required storage already suffices), and you can access the members of it, including calling a function on that object. But that will only work if you precisely know the required alignment of T, and the size of T (the buffer may not be smaller than it), and the alignment of all the members of T. Even for a pod type, the compiler is allowed to put padding bytes between members, if it wants. For a non-POD types, you can have the same luck if your type has no virtual functions or base classes, no user defined constructor (of course) and that applies to the base and all its non-static members too.
For all other types, all bets are off. You have to read values out first with a POD, and then initialize a non-POD type with that data.
I am storing objects in a buffer. ... If I know the overall size of the object, is it acceptable to create a pointer to this memory and call functions on it?
This is acceptable to the extent that using casts is acceptable:
#include <iostream>
namespace {
class A {
int i;
int j;
public:
int value()
{
return i + j;
}
};
}
int main()
{
char buffer[] = { 1, 2 };
std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}
Casting an object to something like raw memory and back again is actually pretty common, especially in the C world. If you're using a class hierarchy, though, it would make more sense to use pointer to member functions.
say I have the following class: ...
if I know this class to be of size 24 and I know the address of where it starts in memory ...
This is where things get difficult. The size of an object includes the size of its data members (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved from certain size optimizations (empty base class optimization). If the resulting number is 0 bytes, then the object is required to take at least one byte in memory. These things are a combination of language issues and common requirements that most CPUs have regarding memory accesses. Trying to get things to work properly can be a real pain.
If you just allocate an object and cast to and from raw memory you can ignore these issues. But if you copy an object's internals to a buffer of some sort, then they rear their head pretty quickly. The code above relies on a few general rules about alignment (i.e., I happen to know that class A will have the same alignment restrictions as ints, and thus the array can be safely cast to an A; but I couldn't necessarily guarantee the same if I were casting parts of the array to A's and parts to other classes with other data members).
Oh, and when copying objects you need to make sure you're properly handling pointers.
You may also be interested in things like Google's Protocol Buffers or Facebook's Thrift.
Yes these issues are difficult. And, yes, some programming languages sweep them under the rug. But there's an awful lot of stuff getting swept under the rug:
In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. On top of this, every object has a 2-word header in memory. The JVM's word size is usually the platform's native pointer size. (An object consisting of only a 32-bit int and a 64-bit double -- 96 bits of data -- will require) two words for the object header, one word for the int, two words for the double. That's 5 words: 160 bits. Because of the alignment, this object will occupy 192 bits of memory.
This is because Sun is relying on a relatively simple tactic for memory alignment issues (on an imaginary processor, a char may be allowed to exist at any memory location, an int at any location that is divisible by 4, and a double may need to be allocated only on memory locations that are divisible by 32 -- but the most restrictive alignment requirement also satisfies every other alignment requirement, so Sun is aligning everything according to the most restrictive location).
Another tactic for memory alignment can reclaim some of that space.
If the class contains no virtual functions (and therefore class instances have no vptr), and if you make correct assumptions about the way in which the class' member data is laid out in memory, then doing what you're suggesting might work (but might not be portable).
Yes, another way (more idiomatic but not much safer ... you still need to know how the class lays out its data) would be to use the so-called "placement operator new" and a default constructor.
That depends upon what you mean by "safe". Any time you cast a memory address into a point in this way you are bypassing the type safety features provided by the compiler, and taking the responsibility to yourself. If, as Chris implies, you make an incorrect assumption about the memory layout, or compiler implementation details, then you will get unexpected results and loose portability.
Since you are concerned about the "safety" of this programming style it is likely worth your while to investigate portable and type-safe methods such as pre-existing libraries, or writing a constructor or assignment operator for the purpose.