I'm trying to figure out how I could make a linked list which links to a single byte array. So each element I put into the byte array could be enqued() and dequeued(). However, I need to figure out how to do this using pointer offsets and linked lists.
My question is:
How do I get an offset of a set amount from the start of a pointer? For example, let's say the beginning of my list is at one pointer. I would start by just checking if that space is empty, if not, get the next value in the list. How do I offset from a current pointer position and get a new pointer location that is basically just an offset of another pointer, forward or backwards, up or down, left and right, plus or minus.
Someone asked for an example:
byte myData[1024];
I have to store all of my data into this. This is for a class assignment. Essentially, I have to use this array to store any and all of my data to it, and basically create a queue, like the standard c++ queue. I have to create Enqueue() and Dequeue() functions and then dynamically allocate the memory for each. I have a general idea of what I'm doing. I'm stuck on trying to figure out how to take a pointer of my current position, and then set it to a new position, and then have that be my "next" in the list.
It sounds like what you really want is pointer arithmetic. It's simple enough.
std::int32_t foo[] = {42, 350};
std::int32_t* intPtr = &foo; // We'll say foo is at address 0x005
++intPtr; // Or intPtr += 1, either way the value of intPtr is now 0x009
// *intPtr would now give you 350.
// Your program knows the type being pointed to, and bumps up the address
// accordingly. In this case a 4-byte integer
When doing pointer arithmetic on a C-array, it's important to have checks in place to stop you going out of bounds on either side. However, I don't even think pointer arithmetic is necessary. If you're storing an array privately, simply using index access and tracking what index your list ends at is a lot simpler. You still have to do checks, but their easier checks.
You're also saying linked list, but describing an array list. They are two very different data structures. Your queue will be a lot easier to write if you write a separate array list class, and store an array list object in your queue instead or a raw array.
How do I get an offset of a set amount from the start of a pointer?
Read the C++11 standard n3337 about pointer arithmetic. Notice the existence of offsetof in C++.
If you have two short*ptr1; and short*ptr2; pointers which contain a valid address, you might code ptr1 - ptr2 or ptr1 + 5 or ptr2 - 3 (however, ptr1+ptr2 is forbidden). The C++11 standard explains when that is valid (sometimes it is not, e.g. when ptr2 is the nullptr). Notice also that in general &ptr1[3] is the same as ptr1+3 and ptr2[-1] is exactly *(ptr2-1) (when that makes sense).
Beware of undefined behavior in your code, such as buffer overflows (and you will have one if you do pointer arithmetic carelessly: beware of segmentation faults).
Tools like address sanitizers, debuggers (such as GDB), valgrind should be helpful to understand the behavior of your code.
Don't forget to enable warnings and debug info in your C++ compiler. Once your C++ code compiles without warnings, read how to debug small programs. With GCC, compile with g++ -Wall -Wextra -g. Notice that GCC 10 adds some static analysis abilities. And you could use the Clang static analyzer or Frama-C (or develop your own GCC plugin).
The linked list wikipage has a nice figure. The wikipage on tries could help you also.
I recommend reading a good C++ programming book and then some introduction to algorithms.
On github or elsewhere you can find tons of examples of C++ code related to your question (whose terminology is confusing to non-native English speakers).
Academic papers about memory shape analysis (such as this one or that one) contain figures which would improve your understanding. Books or web resources about garbage collection are also relevant.
Related
I have heard quite a lot about storing external data in pointer.
For example in (short string optimization).
For example:
when we want to overload << for our SSO class, dependant of the length of the string we want to print either value of pointer or string.
Instead of creating bool flag we could encode this flag inside pointer itself. If i am not mistaken its thanks PC architecture that adds padding to prevent unalligned memory access.
But i have yet to see it in example. How could we detect such flag, when binary operation such as & to check if RSB or LSB is set to 1 ( as a flag ) are not allowed on pointers? Also wouldnt this mess up dereferencing pointers?
All answers are appreciated.
It is quite possible to do such things (unlike other's have said). Most modern architectures (x86-64, for example) enforce alignment requirements that allow you to use the fact that the least significant bits of a pointer may be assumed to be zero, and make use of that storage for other purposes.
Let me pause for a second and say that what I'm about to describe is considered 'undefined behavior' by the C & C++ standard. You are going off-the-rails in a non-portable way by doing what I describe, but there are more standards governing the rules of a computer than the C++ standard (such as the processors assembly reference and architecture docs). Caveat emptor.
With the assumption that we're working on x86_64, let us say that you have a class/structure that starts with a pointer member:
struct foo {
bar * ptr;
/* other stuff */
};
By the x86 architectural constraints, that pointer in foo must be aligned on an 8-byte boundary. In this trivial example, you can assume that every pointer to a struct foo is therefore an address divisible by 8, meaning the lowest 3 bits of a foo * will be zero.
In order to take advantage of such a constraint, you must play some casting games to allow the pointer to be treated as a different type. There's a bunch of different ways of performing the casting, ranging from the old C method (not recommended) of casting it to and from a uintptr_t to cleaner methods of wrapping the pointer in a union. In order to access either the pointer or ancillary data, you need to logically 'and' the datum with a bitmask that zeros out the part of the datum you don't wish.
As an example of this explanation, I wrote an AVL tree a few years ago that sinks the balance book-keeping data into a pointer, and you can take a look at that example here: https://github.com/jschmerge/structures/blob/master/tree/avl_tree.h#L31 (everything you need to see is contained in the struct avl_tree_node at the line I referenced).
Swinging back to a topic you mentioned in your initial question... Short string optimization isn't implemented quite the same way. The implementations of it in Clang and GCC's standard libraries differ somewhat, but both boil down to using a union to overload a block of storage with either a pointer or an array of bytes, and play some clever tricks with the string's internal length field for differentiating whether the data is a pointer or local array. For more of the details, this blog post is rather good at explaining: https://shaharmike.com/cpp/std-string/
"encode this flag inside pointer itself"
No, you are not allowed to do this in either C or C++.
The behaviour on setting (let alone dereferencing) a pointer to memory you don't own is undefined in either language.
Sadly what you want to achieve is to be done at the assembler level, where the distinction between a pointer and integer is sufficiently blurred.
My background is C++ and I'm currently about to start developing in C# so am doing some research. However, in the process I came across something that raised a question about C++.
This C# for C++ developers guide says that
In C++ an array is merely a pointer.
But this StackOverflow question has a highly-upvoted comment that says
Arrays are not pointers. Stop telling people that.
The cplusplus.com page on pointers says that arrays and pointers are related (and mentions implicit conversion, so they're obviously not the same).
The concept of arrays is related to that of pointers. In fact, arrays work very much like pointers to their first elements, and, actually, an array can always be implicitly converted to the pointer of the proper type.
I'm getting the impression that the Microsoft page wanted to simplify things in order to summarise the differences between C++ and C#, and in the process wrote something that was simpler but not 100% accurate.
But what have arrays got to do with pointers in the first place? Why is the relationship close enough for them to be summarised as the "same" even if they're not?
The cplusplus.com page says that arrays "work like" pointers to their first element. What does that mean, if they're not actually pointers to their first element?
There is a lot of bad writing out there. For example the statement:
In C++ an array is merely a pointer.
is simply false. How can such bad writing come about? We can only speculate, but one possible theory is that the author learned C++ by trial and error using a compiler, and formed a faulty mental model of C++ based on the results of his experiments. This is possibly because the syntax used by C++ for arrays is unconventional.
The next question is, how can a learner know if he/she is reading good material or bad material? Other than by reading my posts of course ;-) , participating in communities like Stack Overflow helps to bring you into contact with a lot of different presentations and descriptions, and then after a while you have enough information and experience to make your own decisions about which writing is good and which is bad.
Moving back to the array/pointer topic: my advice would be to first build up a correct mental model of how object storage works when we are working in C++. It's probably too much to write about just for this post, but here is how I would build up to it from scratch:
C and C++ are designed in terms of an abstract memory model, however in most cases this translates directly to the memory model provided by your system's OS or an even lower layer
The memory is divided up into basic units called bytes (usually 8 bits)
Memory can be allocated as storage for an object; e.g. when you write int x; it is decided that a particular block of adjacent bytes is set aside to store an integer value. An object is any region of allocated storage. (Yes this is a slightly circular definition!)
Each byte of allocated storage has an address which is a token (usually representible as a simple number) that can be used to find that byte in memory. The addresses of any bytes within an object must be sequential.
The name x only exists during the compilation stage of a program. At runtime there can be int objects allocated that never had a name; and there can be other int objects with one or more names during compilation.
All of this applies to objects of any other type, not just int
An array is an object which consists of many adjacent sub-objects of the same type
A pointer is an object which serves as a token identifying where another object can be found.
From hereon in, C++ syntax comes into it. C++'s type system uses strong typing which means that each object has a type. The type system extends to pointers. In almost all situations, the storage used to store a pointer only saves the address of the first byte of the object being pointed to; and the type system is used at compilation time to keep track of what is being pointed to. This is why we have different types of pointer (e.g. int *, float *) despite the fact that the storage may consist of the same sort of address in both cases.
Finally: the so-called "array-pointer equivalence" is not an equivalence of storage, if you understood my last two bullet points. It's an equivalence of syntax for looking up members of an array.
Since we know that a pointer can be used to find another object; and an array is a series of many adjacent objects; then we can work with the array by working with a pointer to that array's first element. The equivalence is that the same processing can be used for both of the following:
Find Nth element of an array
Find Nth object in memory after the one we're looking at
and furthermore, those concepts can be both expressed using the same syntax.
They are most definitely not the same thing at all, but in this case, confusion can be forgiven because the language semantics are ... flexible and intended for the maximum confusion.
Let's start by simply defining a pointer and an array.
A pointer (to a type T) points to a memory space which holds at least one T (assuming non-null).
An array is a memory space that holds multiple Ts.
A pointer points to memory, and an array is memory, so you can point inside or to an array. Since you can do this, pointers offer many array-like operations. Essentially, you can index any pointer on the presumption that it actually points to memory for more than one T.
Therefore, there's some semantic overlap between (pointer to) "Memory space for some Ts" and "Points to a memory space for some Ts". This is true in any language- including C#. The main difference is that they don't allow you to simply assume that your T reference actually refers to a space where more than one T lives, whereas C++ will allow you to do that.
Since all pointers to a T can be pointers to an array of T of arbitrary size, you can treat pointers to an array and pointers to a T interchangably. The special case of a pointer to the first element is that the "some Ts" for the pointer and "some Ts" for the array are equal. That is, a pointer to the first element yields a pointer to N Ts (for an array of size N) and a pointer to the array yields ... a pointer to N Ts, where N is equal.
Normally, this is just interesting memory crapping-around that nobody sane would try to do. But the language actively encourages it by converting the array to the pointer to the first element at every opportunity, and in some cases where you ask for an array, it actually gives you a pointer instead. This is most confusing when you want to actually use the array like a value, for example, to assign to it or pass it around by value, when the language insists that you treat it as a pointer value.
Ultimately, all you really need to know about C++ (and C) native arrays is, don't use them, pointers to arrays have some symmetries with pointers to values at the most fundamental "memory as an array of bytes" kind of level, and the language exposes this in the most confusing, unintuitive and inconsistent way imaginable. So unless you're hot on learning implementation details nobody should have to know, then use std::array, which behaves in a totally consistent, very sane way and just like every other type in C++. C# gets this right by simply not exposing this symmetry to you (because nobody needs to use it, give or take).
Arrays and pointers in C and C++ can be used with the exact same semantics and syntax in the vast majority of cases.
That is achieved by one feature:
Arrays decay to pointers to their first element in nearly all contexts.
Exceptions in C: sizeof, _Alignas, _Alignas, address-of &
In C++, the difference can also be important for overload-resolution.
In addition, array notation for function arguments is deceptive, these function-declarations are equivalent:
int f(int* a);
int f(int a[]);
int f(int a[3]);
But not to this one:
int f(int (&a)[3]);
Besides what has already been told, there is one big difference:
pointers are variables to store memory addresses, and they can be incremented or decremented and the values they store can change (they can point to any other memory location). That's not the same for arrays; once they are allocated, you can't change the memory region they reference, e.g. you cannot assign other values to them:
int my_array[10];
int x = 2;
my_array = &x;
my_array++;
Whereas you can do the same with a pointer:
int *p = array;
p++;
p = &x;
The meaning in this guide was simply that in C# an array is an object (perhaps like in STL that we can use in C++), while in C++ an array is basically a sequence of variables located & allocated one after the other, and that's why we can refer to them using a pointer (pointer++ will give us the next one etc.).
it's as simple as:
int arr[10];
int* arr_pointer1 = arr;
int* arr_pointer2 = &arr[0];
so, since arrays are contiguous in memory, writing
arr[1];
is the same as writing:
*(arr_pointer+1)
pushing things a bit further, writing:
arr[2];
//resolves to
*(arr+2);
//note also that this is perfectly valid
2[arr];
//resolves to
*(2+arr);
What is the internal data structure of a QVarLengthArray?
For example, if I where to have:
QVarLengthArray<QString> anArray;
QString string1 = "whatever";
QString string2 = "something else";
anArray[0] = string1;
anArray[1] = string2;
Is it easy to pre-calculate &anArray[1] given &anArray?
I have been traipsing through the QVarLengthArray source code trying to understand how QVarLengthArray stores an array of QStrings in memory. As much as I like Qt, one thing that is particularly painful to me is its opaque pointer basis. (The helper functions in the debugger help in some cases, but when really trying to dig into the internals, the opaque pointers obscure a great deal of information that would otherwise be available in the debugger.)
I found a couple "Qt Internals" articles on codeproject.com and elsewhere, but none helped.
In general, it would be great to have a way to peer into the real data structures behind the opaque pointers, but for the immediate need it would be great to understand if there is a good way to predict the start address of each element in the QVarLengthArray of MyClass which contains pointers, QStrings, and integers.
(Having this information will help simplify a custom serialization. I do understand the risks to reusability and am willing to accept those risks for this experiment.)
Look under the "private" section of the class headers to find the member variables -- these will tell you the class structure. Here's a link to the first member of QVarLengthArray: http://code.woboq.org/qt5/qtbase/src/corelib/tools/qvarlengtharray.h.html#QVarLengthArray::a
In Qt 5, a QVarLengthArray starts with 2 ints, followed by a pointer to the first element in the array, followed by a union which holds the actual array itself, preallocated on the stack.
If your array size is less than or equal to the preallocated capacity, then &(array[1]) is simply a fixed number of bytes after &anArray. However, if your array grows bigger than the preallocated capacity, then QVarLengthArray will switch to the heap instead. When this happens, there is no longer any relationship between &(array[1]) and &anArray.
If you have &anArray, a robust way to find &(anArray[1]) is as follows:
QString* anArray_0 = (&anArray)->begin(); // &(anArray[0])
QString* anArray_1 = anArray_0 + 1; // &(anArray[1])
Or, to do it the low-level way without calling any member functions (assuming there's no padding):
// Use reinterpret_cast to enable 1-byte pointer arithmetic
char* outerPtr = reinterpret_cast<char*>(&anArray);
QString* anArray_0 = reinterpret_cast<QString*>( outerPtr + 2*sizeOf(int) ); // &(anArray[0])
QString* anArray_1 = anArray_0 + 1; // &(anArray[1])
(Having this information will help simplify a custom serialization. I do understand the risks to reusability and am willing to accept those risks for this experiment.)
Qt promises source- and binary-compatibility across minor releases. The structure of QVarLengthArray is guaranteed to remain unchanged until Qt 6, at least.
In general, it would be great to have a way to peer into the real data structures behind the opaque pointers
I find the Woboq Code Browser very useful for this -- the source code becomes an interactive web of hyperlinks, and you can search for any class in the library. Just look in the class header to find the opaque pointer, and click on it.
Stash library in "Thinking in C++" by Bruce Eckel:
Basically he seems to be setting up an array-index-addressable interface (via fetch) to a set of entities that are actually stored at random memory locations, without actually copying those data entities, in order to simulate for the user the existence of an actual contiguous-memory data block. In short, a contiguous, indexed address map. Do I have this right? Also, his mapping is on a byte-by-byte basis; if it were not for this requirement (and I am unsure of its importance), I believe that there may be simpler ways to generate such a data structure in C++. I looked into memcpy, but do not see how to actually copy data on a byte-by-byte basis to create such an indexed structure.
Prior posting:
This library appears to create a pointer assemblage, not a data-storage assemblage.
Is this true? (Applies to both the C and C++ versions.) Thus the name "stash" might be a little misleading, as nothing but pointers to data stashed elsewhere is put into a "stash," and Eckel states that "the data is copied."
Background: Looking at “add” and “inflate,” the so-called “copying” is equating pointers to other pointers (“storage” to “e” in “add” and “b” to “storage” in “inflate”). The use of “new” in this case is strange to me, because storage for data is indeed allocated but “b” is set to the address of the data, and no data assignments seem to take place in the entire library. So I am not sure what the point of the “allocation” by “new” is when the allocated space is apparently never written into or read from in the library. The “element” to be added exists elsewhere in memory already, and seemingly all we are doing is creating a sequential pointer structure to each byte of every “element” desired to be reference-able through CStash. Do I understand this library correctly?
Similarly, it looks as though “stack” in the section “Nested structures” appears actually to work only with addresses of data, not with data. I wrote my own linked-list stack successfully, which actually stores data in the stack nodes.
Let say I have 2 pointers pointing to the same memory location. If I know what the address it is, how can I find out what pointers are pointing to that location?
int x=5;
int* p1=&x;
int* p2=&x;
How do I get the address of p1 and p2? Is it possible to even do this in C/C++? If not then is it possible to search through all pointers and see which ones have the value of &x?
No, its not possible to "backtrack" a pointer in C or C++ (a good rule of thumb is if a feature has big hidden performance costs, then its not present in C or C++)
As for the second approach (going through memory looking for pointers), that is precisely what some tools like the Boehm garbage collector do. However, not only is this process inneficient and not portable but it also can lead to "false positives" since you can't tell if a byte pattern in memory is a real pointer or something else like a regular integer or part of a string.
Anyway, you should ask yourself what is the real problem you need to solve instead of trying to hack a garbage collector on your own. Depending on what you want to do there are many ways to approach it in C++ (RAII, smart pointers, etc)