What is pointer swizzling? - c++

I was reading about integer overflow on wikipedia , and came across the term Pointer Swizzling in the see also collumn.
I googled about pointer swizzling , but couldn't understand it.
Can anyone please explain what is pointer swizzling?

The wikipedia page explains this, but let me say it another way.
Say you have a binary tree data structure in memory and want to save the structure to disk.
You can not simply write the structure to disk because the pointers will be invalid on disk.
Also when you later want to read the binary tree from disk back into memory, the addresses
used in the original memory copy of the tree might already be in use in the new process.
Pointer swizzling is converting pointers to handles when writing from memory to disk,
and also converting handles to (different) pointers when reading the disk data back into memory.

Related

"Thinking in C++": operation of "Stash" library

Stash library in "Thinking in C++" by Bruce Eckel:
Basically he seems to be setting up an array-index-addressable interface (via fetch) to a set of entities that are actually stored at random memory locations, without actually copying those data entities, in order to simulate for the user the existence of an actual contiguous-memory data block. In short, a contiguous, indexed address map. Do I have this right? Also, his mapping is on a byte-by-byte basis; if it were not for this requirement (and I am unsure of its importance), I believe that there may be simpler ways to generate such a data structure in C++. I looked into memcpy, but do not see how to actually copy data on a byte-by-byte basis to create such an indexed structure.
Prior posting:
This library appears to create a pointer assemblage, not a data-storage assemblage.
Is this true? (Applies to both the C and C++ versions.) Thus the name "stash" might be a little misleading, as nothing but pointers to data stashed elsewhere is put into a "stash," and Eckel states that "the data is copied."
Background: Looking at “add” and “inflate,” the so-called “copying” is equating pointers to other pointers (“storage” to “e” in “add” and “b” to “storage” in “inflate”). The use of “new” in this case is strange to me, because storage for data is indeed allocated but “b” is set to the address of the data, and no data assignments seem to take place in the entire library. So I am not sure what the point of the “allocation” by “new” is when the allocated space is apparently never written into or read from in the library. The “element” to be added exists elsewhere in memory already, and seemingly all we are doing is creating a sequential pointer structure to each byte of every “element” desired to be reference-able through CStash. Do I understand this library correctly?
Similarly, it looks as though “stack” in the section “Nested structures” appears actually to work only with addresses of data, not with data. I wrote my own linked-list stack successfully, which actually stores data in the stack nodes.

Win 32 or boost memory map access

I was told about memory mapped files as a possible way to get fast file i/o to store a 2d game tile map. The game will have frequent updates to the data where I will know the row/col to update so I can get direct access that way in the array. However looking at some examples I don't understand how this would work.
Does anyone have a small example of creating, reading, & writing to a memory map file of a struct, where the result would be a 1D array so I can access it for my game as map[row * MAX_ROW + col].tileID = x; for example. Boost or Win 32 would be fine I don't have a preference, but I find the examples online to be somewhat confusing and often have a hard time converting them to my desired result.
There's an example here that looks somewhat understandable: Problem with boost memory mapped files: they go to disk instead of RAM
Note the .data() member that gives you a char*, you could cast this to a pointer to an array of whatever you want given enough memory and go wild.
That said, I highly suspect that memory mapped files is the wrong solution here. Why not just load in your level using normal C++ (vector, classes, ifstreams, etc.), modify it however you like, and write it out again when you're done if you want the changes saved to disk?

How to read info from a binary data file into an array of structures C++

I am a beginning C++ student. I have a structure array that holds employee info.
I can put values into the structure, write those values to a binary dat file and
read the values back into the program so that it can be displayed to the console.
Here is my problem. Once I close the program, I can't get the file to read the data from the file back into memory - instead it reads "garbage."
I tried a few things and then read this in my book:
NOTE: Structures containing pointers cannot be correctly stored to
disk using the techniques of this section. This is because if the
structure is read into memory on a subsequent run of the program, it
cannot be guaranteed that all program variables will be at the same
memory locations.
I am pretty sure this is what is going on when I try to open a .dat file with previously stored information and try to read it into a structure array.
I can send my code examples if that would help clarify my question.
Any suggestions would be appreciated.
Speaking generally (since I don't have your code) there's two reasons you generally shouldn't just write the bytes of a struct or class to a file:
As your book mentioned, writing a pointer to disk is pointless, since you're just storing a random address and not the data at that address. You need to write out the data itself.
You especially should not attempt to write a struct/class all at once with something like
fwrite(file, myStruct, sizeof(myStruct)). Compilers sometimes put empty bytes between variables in structs in order to let the processor read them faster - this is called padding. Using a different compiler or compiling for a different computer architecture can pad structures differently, so a file that opens correctly on one computer might not open correctly on another.
There's lots of ways you can write out data to a file, be it in a binary format or in some human-readable format like XML. Regardless of what method you choose to use (each has strengths and weaknesses), every one of them involves writing each piece of data you care to save one by one, then reading them back in one by one. Higher level languages like Java or C# have ways to do this automagically by storing metadata about objects, but this comes at the cost of more memory usage and slower program execution.

Query about memory location

Suppose there is a variable a and a pointer p which points to address of a.
int a;
int *p=&a;
Now since I have a pointer pointing to the location of the variable, I know the exact memory location (or the chunk of memory).
My questions are:
Given an address, can we find which variable is using them? (I don't think this is possible).
Given an address, can we atleast find how big is the chunk of memory to which that memory address belongs. (I know this is stupid but still).
You can enumerate all your (suspect) variables and check if they point to the same location as your pointer (e.g. you can compare pointers for equality)
If your pointer is defined as int *p, you can assume it points to an integer. Your assumption can be proven wrong, of course, if for example the pointer value is null or you meddled with the value of the pointer.
You can think of memory as a big array of bytes:
now if you have a pointer to somewhere in middle of array, can you tell me how many other pointers point to same location as your pointer?? Or can you tell me how much information I stored in memory location that you point to it?? Or can you at least tell me what kind of object stored at location of your pointer?? Answer to all of this question is really impossible and the question look strange. Some languages add extra information to their memory management routines that they can track such information at a later time but in C++ we have the minimum overhead, so your answer is no it is not possible.
For your first question you may handle it using smart pointers, for example shared_ptr use a reference counter to know how many shared_ptr are pointing to a memory location and be able to control life time of the object(but current design of shared_ptr do not allow you to read that counter).
There is non-standard platform dependent solution to query size of dynamically allocated memory(for example _msize on Windows and memory_size on Unix) but that only work with dynamic memories that allocated using malloc and is not portable, in C++ the idea is you should care for this, if you need this feature implement a solution for it and if you don't need it, then you never pay extra cost of it
Given an address ,can we find which variable is using them ?
No, this isn't possible. variables point to memory, not the other way around. There isn't some way to get to variable-names from compiled code, except maybe via the symbol table, reading which in-turn would probably need messing around with assembly.
Given an address ,can we atleast find how big is the chunk of memory
to which that memory address belongs..
No. There isn't a way to do that given just the address. You could find the sizeof() after dereferencing the address but not from the address itself.
Question 1.
A: It cannot be done natively, but could be done by Valgrind memcheck tool. The VM tracks down all variables and allocated memory space/stack. However, it is not designed to answer such question, but with some modification, memcheck tool could answer this question. For example, it can correlate invalid memory access or memory leakage address to variables in the source code. So, given a valid and known memory address, it must be able to find the corresponding variable.
Question 2.
A: It can be done like above, but it can also be done natively with some PRELOADED libraries for malloc, calloc, strdup, free, etc. By manual instructed memory allocation functions, you can save allocated address and size. And also save the return address by __builtin_return_address() or backtrace() to know where the memory chunk is being allocated. You have to save all allocated address and size to a tree. Then you should be able to query the address belongs to which chunk and the chunk size, and what function allocated the chunk.

Qt - Serializing a tree structure (a lot of pointers)

I have a tree structure with a lot of pointers, basically a node of the tree is like this
class Node
{
Node *my_father;
QVector<Node*> my_children;
... a lot of data
}
I need all these pointers to make my job easier while in RAM memory. But now I need to save all the tree structure on disk.. I was thinking about using QDataStream serialization (http://www.developer.nokia.com/Community/Wiki/Qt_Object_Serialization).. but I don't think this is going to work with pointers.. right?
What would you suggest to save this big structure on disk and re-read it into RAM with pointers working?
Why don't you use the XML format? It's by design very easy to use with all structured data, with nested objects, like the tree structure you use. But you don't want to store pointers in it - just the actual data. (The data stored in your pointers, that describe tree structure will become a XML structure itself, so you don't need them).
Then you'll need to recreate the pointers during file read, when you allocate a new children for some node.
BTW sorry for making this answer and not comment but I can't write question comments yet ;].
Clearly, there is no guarantee that pointers read from disk would ever be valid as such. But you could still use them as 'integer IDs', as follows. To write, save the pointers to disk along with the rest of the data. In addition, for each class instance, save its own address to disk. This will be that object's 'integer ID'. To read,
1) Use the saved integer ID information to associate each object with its children and father. Initially, you'll probably have to read all of your Nodes into a single big list.
2) Then once the children, father are in memory write their actual addresses into my_father and my_children respectively.
Feels a bit hacky to me but I can't think of a more direct way to get at this.