I understand pointers are used to point to objects so you would have to the same around in a program. But were and how are pointer names stored. Would it be an overkill to declare a pointer name that occupies more resource than the object it points to for example:
int intOne = 0;
int *this_pointer_is_pointing_towards_intOne = &intOne;
I know this is a ridiculous example but i was just trying to get the idea across.
Edit: the name of the pointer has to be stored somewhere taking more bytes than the address of the pointed at object.
The length of the variable name doesn't have any effect on the size of your program, just the length of time it takes to write the program.
The name of local variables are only needed for the compiler to find the variables you want to refer to. After compiling, those names usually are erased and completely replaced by numeric symbols or equivalents. This happens for all names that have no linkage practically (of course if you do a debug build, things may be different). So, the same is true for function parameters.
The name of global variables, for example, can't be erased, because you may use it from another unit in your program, and the linker has to be able to look it up. But after your program has been linked, even the name of those can be erased.
And after all, these do not occupy runtime memory. Those names are stored in a reallocation table for the purpose of linking (see the strip program how to remove those names).
But anyway, we are talking about a few bytes which are already wasted by alignment and whatnot. Compare that to the hell-long names of template instantiations. Try out this:
readelf -sW /usr/lib/libboost_*-mt.so | awk '{ print length($0), $0 }' | sort -n
Pointer names are not stored. The pointer name (or any variable name for that matter) are not compiled into the final binary (provided you don't compile with symbols set to on).
Pointers are simply integers (or longs) that are stored in memory which, in turn, point to the item they are pointing to in some location in memory.
Microsoft uses the "p" prefix to indicate a pointer:
int intOne = 0;
int* pIntOne = &intOne;
They actually use Hungarian on everything.
It works pretty well once you get used to seeing it. Many people think it is ugly at first.
Whatever you decide, I think it's valuable to denote that something is a pointer type in its name, though I wouldn't go quite so far as your example. :)
I agree with previous posts but I would like to point out something around it. Sometimes people overuse pointers thinking that their use will automatically provide small memory footprints. This is not always the case. Consider this piece of code:
void myfunc(const char *var) {
// Function body
}
A pointer to char will take 4 bytes in a 32 bits architecture while the char itself would usually take 1 byte. (Here var is assumed to point to a single byte, not to a string.) Can you see the point? On the other hand, you should always use pointers (or references) for complex objects:
void myfunc(const string &str) {
// Function body
}
Of course, in case you want to modify the variable inside your function you should remove the const keyword.
I don't see a reason why it should be preceded with anything at all. Your compiler/editor will do the job for you.
Well I understand what you are asking here. Its good to remember that the variable names only affect the size of the text file and the time it took you to write that. But do remember that variable names and pointer names and function names etc.. all get lost in translation to machine code by the GNU GCC compiler. The final thing that will be stored in .o or .exe will be 0s and 1s, so you shouldn't worry about names being to big :)
Well pointers are most often used for things that are allocated dynamically, and/or functions.
In the case of dynamic allocation, you can just precede it with ptr, if you're insistent on using this notation. In the case of functions, fun works just as well.
There are two critical points here that I would like to atempt to distill out:
The length of a variable name has no impact on the space allocated for the data in the final compiled C/C++ program as the compiler comes up with its own names for variables.
N.B. Some compilers, especially the older ones, may only recognise the first few letters of a variable name so having very long overly complicated variable names in your source code may lead to clashes
Also, if you create symbol files for debugging they will contain all your variable names so will become unecessarily large but this might not significantly slow down debugging, I've never checked!
The space needed to store a pointer may indeed be larger than the space needed than the data pointed to but there are still circumstances in which you may wish to use them in any case. For instance in an architecture such as COM where the practise is that functions return only a result code (success or failure) and all data that is changed is done through pointers passed on the stack:
/*
pszString could be pointing to only one character which
is less space than the pointer
*/
HRESULT OneLetterSplat(char * pszString)
{
*pszString = 'a';
return SUCCESS
}
Related
I've been doing some reading on immutable strings in general and in c++, here, here, and I think I have a decent understanding of how things work. However I have built a few assumptions that I would just like to run by some people for verification. Some of the assumptions are more general than the title would suggest:
While a const string in c++ is the closest thing to an immutable string in STL, it is only locally immutable and therefore doesn't experience the benefit of being a smaller object. So it has all the trimmings of a standard string object but it can't access all of the member functions. This means that it doesn't create any optimization in the program over non-const? But rather just protects the object from modification? I understand that this is an important attribute but I'm simply looking to know what it means to use this
I'm assuming that an object's member functions exist only once in read-only memory, and how is probably implementation specific, so does a const object have a separate location in memory? Or are the member functions limited in another way? If there are only 'const string' objects and no non-const strings in a code base, does the compiler leave out the inaccessible functions?
I recall hearing that each string literal is stored only once in read-only memory in c++, however I don't find anything on this here. In other words, if I use some string literal multiple times in the same program, each instance references the same location in memory. I'm going to assume no, but would two string objects initialized by the same string literal point to the same string until one is modified?
I apologize if I have included too many disjunct thoughts in the same post, they are all related to me as string representation and just learning how to code better.
As far as I know, std::string cannot assume that the input string is a read-only constant string from your data segment. Therefore, point (3) does not apply. It will most likely allocate a buffer and copy the string in the buffer.
Note that C++ (like C) has a const qualifier for compilation time, it is a good idea to use it for two reasons: (a) it will help you find bugs, a statement such as a = 5; if a is declared const fails to compile; (b) the compile may be able to optimize the code more easily (it may otherwise not be able to figure out that the object is constant.)
However, C++ has a special cast to remove the const-ness of a variable. So our a variable can be cast and assigned a value as in const_cast<int&>(a) = 5;. An std::string can also get its const-ness removed. (Note that C does not have a special cast, but it offers the exact same behavior: * (int *) &a = 5)
Are all class members defined in the final binary?
No. std::string as most of the STL uses templates. Templates are compiled once per unit (your .o object files) and the link will reduce duplicates automatically. So if you look at the size of all the .o files and add them up, the final output will be a lot small.
That also means only the functions that are used in a unit are compiled and saved in the object file. Any other function "disappear". That being said, often function A calls function B, so B will be defined, even if you did not explicitly call it.
On the other hand, because these are templates, very often the functions get inlined. But that is a choice by the compiler, not the language or the STL (although you can use the inline keyword for fun; the compiler has the right to ignore it anyway).
Smaller objects... No, in C++ an object has a very specific size that cannot change. Otherwise the sizeof(something) would vary from place to place and C/C++ would go berserk!
Static strings that are saved in read-only data sections, however, can be optimized. If the linker/compiler are good enough, they will be able to merge the same string in a single location. These are just plan char * or wchar_t *, of course. The Microsoft compiler has been able to do that one for a while now.
Yet, the const on a string does not always force your string to be put in a read-only data section. That will generally depend on your command line option. C++ may have corrected that, but I think C still put everything in a read/write section unless you use the correct command line option. That's something you need to test to make sure (your compiler is likely to do it, but without testing you won't know.)
Finally, although std::string may not use it, C++ offers a quite interesting keyword called mutable. If you heard about it, you would know that a variable member can be marked as mutable and that means even const functions can modify that variable member. There are two main reason for using that keyword: (1) you are writing a multi-thread program and that class has to be multi-thread safe, in that case you mark the mutex as mutable, very practical; (2) you want to have a buffer used to cache a computed value which is costly, that buffer is only initialized when that value is requested to not waste time otherwise, that buffer is made mutable too.
Therefore the "immutable" concept is only really something that you can count on at a higher level. In practice, reality is often quite different. For example, an std::string c_str() function may reallocate the buffer to add the necessary '\0' terminator, yet that function is marked as being a const:
const CharT* c_str() const;
Actually, an implementation is free to allocate a completely different buffer, copy its existing data to that buffer and return that bare pointer. That means internally the std::string could be allocate many buffers to store large strings (instead of using realloc() which can be costly.)
Once thing, though... when you copy string A into string B (B = A;) the string data does not get copied. Instead A and B will share the same data buffer. Once you modify A or B, and only then, the data gets copied. This means calling a function which accepts a string by copy does not waste that much time:
int func(std::string a)
{
...
if(some_test)
{
// deep copy only happens here
a += "?";
}
}
std::string b;
func(b);
The characters of string b do not get copied at the time func() gets called. And if func() never modifies 'a', the string data remains the same all along. This is often referenced as a shallow copy or copy on write.
Quite likely this has been asked/answered before, but not sure how to phrase it best, a link to a previously answered question would be great.
If you define something like
char myChar = 'a';
I understand that this will take up one byte in memory (depending on implementation and assuming no unicode and so on, the actual number is unimportant).
But I would assume the compiler/computer would also need to keep a table of variable types, addresses (i.e. pointers), and possibly more. Otherwise it would have the memory reserved, but would not be able to do anything with it. So that's already at least a few more bytes of memory consumed per variable.
Is this a correct picture of what happens, or am I misunderstanding what happens when a program gets compiled/executed? And if the above is correct, is it more to do with compilation, or execution?
The compiler will keep track of the properties of a variable - its name, lifetime, type, scope, etc. This information will exist in memory only during compilation. Once the program has been compiled and the program is executed, however, all that is left is the object itself. There is no type information at run-time (except if you use RTTI, then there will be some, but only because you required it for your program to function - such as is required for dynamic_casting).
Everything that happens in the code that accesses the object has been compiled into a form that treats it exactly as a single byte (because it's a char). The address that the object is located at can only be known at run-time anyway. However, variables with automatic storage duration (like local variables), are typically located simply by some fixed offset from the current stack frame. That offset is hard-baked into the executable.
Wether a variable contains extra information depends on the type of the variable and your compiler options. If you use RTTI, extra information is stored. If you compile with debug information then there will also extra overhead be added.
For native datatypes like your example of char there is usually no overhead, unless you have structs which also can cotnain padding bytes. If you define classes, there may be a virtual table associated with your class. However, if you dynamically allocate memory, then there usually will be some overhead along with your allocated memory.
Somtimes a variable may not even exist, because the optimizer realizes that there is no storage needed for it, and it can wrap it up in a register.
So in total, you can not rely on counting your used variables and sum their size up to calculate the amount of memory it requires because there is not neccessarily a 1:1: relation.
Some types can be detected in compile type, say in this code:
void foo(char c) {...}
it is obvious what type of variable c in compile time is.
In case of inheritance you cannot know the real type of the variable in the compile type, like:
void draw(Drawable* drawable); // where drawable can be Circle, Line etc.
But C++ compiler can help to determine the type of the Drawable using dynamic_cast. In this case it uses pointer to a virtual method tables, associated with an object to determine the real type.
these are the some silly question ..i want to ask..please help me to comprehend it
const int i=100; //1
///some code
long add=(long)&i; //2
Doubt:for the above code..will compiler first go through the whole code
for deciding whether memory should be allocated or not..or first it ll store the
variable in read only memory place and then..allocate stroage as well at 2
doubt:why taking address of variable enforce compiler to store variable on memory..even
though rom or register too have address
In your code example, add contains the address, not the value, of i. I believe you may have thought that i was not stored in normal memory unless/until you take its address. This is not the case.
const does not mean the value is stored in ROM. It is stored in normal memory (often the stack) just like any other variable. const means the compiler will go to some lengths to prevent you from modifying the value.
const is not, and was never intended, to be some sort of security mechanism. If you obtain the address of the memory and want to modify it, you can do so. Of course this is almost always a bad idea, but if you really need to do it, it is possible.
I never wrote a compiler implementing this, but I think that it would be simple to just handle the variable as a normal variable but using the constant value where the variable value is used and using the address of the variable if the address is used.
If at the end of the scope of the variable no one took the address then I can just drop it instead of doing a real allocation because for all other uses the constant value has been used instead of compiling a variable loading operation.
constant values (not the only use for const, but the one used here) are not 'stored in normal memory' (nor in ROM, of course). the compiler simply uses the value (100 in this case) whenever the code uses the variable.
Of course, if the value isn't stored anywhere, there's no meaning of an address for the constant.
Other uses of const are stored in 'normal memory', and you can take their address, but the result is a 'pointer to const value', so it's (in principle) unusable for modification of the value. A hard cast would of course change that, so they trigger a nasty compiler warning.
also, remember that the C/C++ compiler operates totally at compile time (by definition!), it's nothing unusual that some use at a later part affects the code generation of an early part.
A very obvious example is the declaration of stack variables: the compiler has to take into account all the variables declared at any given level to be able to generate the stack allocation at the block entry.
I am a little confused about what you are asking but looking at your code:
i = 100 with a address of 0x?????????????
add = whatever the address is stored as a long int
There is no (dynamic) memory allocation in this code. The two local variables are created on stack. The address of i is taken and brutally cast into long, which is then assigned to the second variable.
How would you locate an object in memory, lets say that you have a struct defined as:
struct POINT {
int x;
int y;
};
How would I scan the memory region of my app to find instances of this struct so that I can read them out?
Thanks R.
You can't without adding type information to the struct. In memory a struct like that is nothing else than 2 integers so you can't recognize them any better than you could recognize any other object.
You can't. Structs don't store any type information (unless they have virtual member functions), so you can't distinguish them from any other block of sizeof(POINT) bytes.
Why don't you store your points in a vector or something?
You can't. You have to know the layout to know what section of memory have to represent a variable. That's a kind of protocol and that's why we use text based languages instead raw values.
You don't - how would you distinguish two arbitrary integers from random noise?
( but given a Point p; in your source code, you can obtain its address using the address-of operator ... Point* pp = &p;).
Short answer: you can't. Any (appropriately aligned) sequence of 8 bytes could potentially represent a POINT. In fact, an array of ints will be indistinguishable from an array of POINTS. In some cases, you could take advantage of knowledge of the compiler implementation to do better. For instance, if the struct had virtual functions, you could look for the correct vtable pointer - but there could also be false positives.
If you want to keep track of objects, you need to register them in their constructor and unregister them in their destructor (and pay the performance penalty), or give them their own allocator.
There's no way to identify that struct. You need to put the struct somewhere it can be found, on the stack or on the heap.
Sometimes data structures are tagged with identifying information to assist with debugging or memory management. As a means of data organization, it is among the worst possible approaches.
You probably need to a lot of general reading on memory management.
There is no standard way of doing this. The platform may specify some APIs which allow you to access the stack and the free store. Further, even if you did, without any additional information how would you be sure that you are reading a POINT object and not a couple of ints? The compiler/linker can read this because it deals with (albeit virtual) addresses and has some more information (and control) than you do.
You can't. Something like that would probably be possible on some "tagged" architecture that also supported tagging objects of user-defined types. But on a traditional architecture it is absolutely impossible to say for sure what is stored in memory simply by looking at the raw memory content.
You can come closer to achieving what you want by introducing a unique signature into the type, like
struct POINT {
char signature[8];
int x;
int y;
};
and carefully setting it to some fixed and "unique" pattern in each object of POINT type, and then looking for that pattern in memory. If it is your application, you can be sure with good degree of certainty that each instance of the pattern is your POINT object. But in general, of course, there will never be any guarantee that the pattern you found belongs to your object, as opposed to being there purely accidentally.
What everyone else has said is true. In memory, your struct is just a few bytes, there's nothing in particular to distinguish it.
However, if you feel like a little hacking, you can look up the internals of your C library and figure out where memory is stored on the heap and how it appears. For example, this link shows how stuff gets allocated in one particular system.
Armed with this knowledge, you could scan your heap to find allocated blocks that were sizeof(POINT), which would narrow down the search considerably. If you look at the table you'll notice that the file name and line number of the malloc() call are being recorded - if you know where in your source code you're allocating POINTs, you could use this as a reference too.
However, if your struct was allocated on the stack, you're out of luck.
I have a list of objects that I would like to store in a file as small as possible for later retrieval. I have been carefully reading this tutorial, and am beginning (I think) to understand, but have several questions. Here is the snippet I am working with:
static bool writeHistory(string fileName)
{
fstream historyFile;
historyFile.open(fileName.c_str(), ios::binary);
if (historyFile.good())
{
list<Referral>::iterator i;
for(i = AllReferrals.begin();
i != AllReferrals.end();
i++)
{
historyFile.write((char*)&(*i),sizeof(Referral));
}
return true;
} else return false;
}
Now, this is adapted from the snippet
file.write((char*)&object,sizeof(className));
taken from the tutorial. Now what I believe it is doing is converting the object to a pointer, taking the value and size and writing that to the file. But if it is doing this, why bother doing the conversions at all? Why not take the value from the beginning? And why does it need the size? Furthermore, from my understanding then, why does
historyFile.write((char*)i,sizeof(Referral));
not compile? i is an iterator (and isn't an iterator a pointer?). or simply
historyFile.write(i,sizeof(Referral));
Why do i need to be messing around with addresses anyway? Aren't I storing the data in the file? If the addresses/values are persisting on their own, why can't i just store the addresses deliminated in plain text and than take their values later?
And should I still be using the .txt extension? < edit> what should I use instead then? I tried .dtb and was not able to create the file. < /edit> I actually can't even seem to get file to open without errors with the ios::binary flag. I'm also having trouble passing the filename (as a string class string, converted back by c_str(), it compiles but gives an error).
Sorry for so many little questions, but it all basically sums up to how to efficiently store objects in a file?
What you are trying to do is called serialization. Boost has a very good library for doing this.
What you are trying to do can work, in some cases, with some very important conditions. It will only work for POD types. It is only guaranteed to work for code compiled with the same version of the compiler, and with the same arguments.
(char*)&(*i)
says to take the iterator i, dereference it to get your object, take the address of it and treat it as an array of characters. This is the start of what is being written to the file. sizeof(Referral) is the number of bytes that will be written out.
An no, an iterator is not necessarily a pointer, although pointers meet all the requirements for an iterator.
Question #1 why does ... not compile?
Answer: Because i is not a Referral* -- it's a list::iterator ;; an iterator is an abstraction over a pointer, but it's not a pointer.
Question #2 should I still be using the .txt extension?
Answer: probably not. .txt is associated by many systems to the MIME type text/plain.
Unasked Question: does this work?
Answer: if a Referral has any pointers on it, NO. When you try to read the Referrals from the file, the pointers will be pointing to the location on memory where something used to live, but there is no guarantee that there is anything valid there anymore, least of all the thing that the pointers were pointing to originally. Be careful.
isn't an iterator a pointer?
An iterator is something that acts like a pointer from the outside. In most (perhaps all) cases, it is actually some form of object instead of a bare pointer. An iterator might contain a pointer as an internal member variable that it uses to perform its job, but it just as well might contain something else or additional variables if necessary.
Furthermore, even if an iterator has a simple pointer inside of it, it might not point directly at the object you're interested in. It might point to some kind of bookkeeping component used by the container class which it can then use to get the actual object of interest. Fortunately, we don't need to care what those internal details actually are.
So with that in mind, here's what's going on in (char*)&(*i).
*i returns a reference to the object stored in the list.
& takes the address of that object, thus yielding a pointer to the object.
(char*) casts that object pointer into a char pointer.
That snippet of code would be the short form of doing something like this:
Referral& r = *i;
Referral* pr = &r;
char* pc = (char*)pr;
Why do i need to be messing around
with addresses anyway?
And why does it need the size?
fstream::write is designed to write a series of bytes to a file. It doesn't know anything about what those bytes mean. You give it an address so that it can write the bytes that exist starting wherever that address points to. You give it a size so that it knows how many bytes to write.
So if I do:
MyClass ExampleObject;
file.write((char*)ExampleObject, sizeof(ExampleObject));
Then it writes all the bytes that exist directly within ExampleObject to the file.
Note: As others have mentioned, if the object you want to write has members that dynamically allocate memory or otherwise make use of pointers, then the pointed to memory will not be written by a single simple fstream::write call.
will serialization give a significant boost in storage efficiency?
In theory, binary data can often be both smaller than plain-text and faster to read and write. In practice, unless you're dealing with very large amounts of data, you'll probably never notice the difference. Hard drives are large and processors are fast these days.
And efficiency isn't the only thing to consider:
Binary data is harder to examine, debug, and modify if necessary. At least without additional tools, but even then plain-text is still usually easier.
If your data files are going to persist between different versions of your program, then what happens if you need to change the layout of your objects? It can be irritating to write code so that a version 2 program can read objects in a version 1 file. Furthermore, unless you take action ahead of time (like by writing a version number in to the file) then a version 1 program reading a version 2 file is likely to have serious problems.
Will you ever need to validate the data? For instance, against corruption or against malicious changes. In a binary scheme like this, you'd need to write extra code. Whereas when using plain-text the conversion routines can often help fill the roll of validation.
Of course, a good serialization library can help out with some of these issues. And so could a good plain-text format library (for instance, a library for XML). If you're still learning, then I'd suggest trying out both ways to get a feel for how they work and what might do best for your purposes.
What you are trying to do (reading and writing raw memory to/from file) will invoke undefined behaviour, will break for anything that isn't a plain-old-data type, and the files that are generated will be platform dependent, compiler dependent and probably even dependent on compiler settings.
C++ doesn't have any built-in way of serializing complex data. However, there are libraries that you might find useful. For example:
http://www.boost.org/doc/libs/1_40_0/libs/serialization/doc/index.html
Did you have already a look at boost::serialization, it is robust, has a good documentation, supports versioning and if you want to switch to an XML format instead of a binary one, it'll be easier.
Fstream.write simply writes raw data to a file. The first parameter is a pointer to the starting address of the data. The second parameter is the length (in bytes) of the object, so write knows how many bytes to write.
file.write((char*)&object,sizeof(className));
^
This line is converting the address of object to a char pointer.
historyFile.write((char*)i,sizeof(Referral));
^
This line is trying to convert an object (i) into a char pointer (not valid)
historyFile.write(i,sizeof(Referral));
^
This line is passing write an object, when it expects a char pointer.