Memory waste? If main() should only return 0 or 1, why is main declared with int and not short int or even char? - c++

For example:
#include <stdio.h>
int main (void) /* Why int and not short int? - Waste of Memory */
{
printf("Hello World!");
return 0;
}
Why main() is conventional defined with int type, which allocates 4 bytes in memory on 32-bit, if it usually returns only 0 or 1, while other types such as short int (2 bytes,32-bit) or even char (1 byte,32-bit) would be more memory saving?
It is wasting memory space.
NOTE: The question is not a duplicate of the thread given; its answers only correspond to the return value itself but not its datatype at explicit focus.
The Question is for C and C++. If the answers between those alter, share your wisdom with the mention of the context of which language in particular is focused.

Usually assemblers use their registers to return a value (for example the register AX in Intel processors). The type int corresponds to the machine word That is, it is not required to convert, for example, a byte that corresponds to the type char to the machine word.
And in fact, main can return any integer value.

It's because of a machine that's half a century old.
Back in the day when C was created, an int was a machine word on the PDP-11 - sixteen bits - and it was natural and efficient to have main return that.
The "machine word" was the only type in the B language, which Ritchie and Thompson had developed earlier, and which C grew out of.
When C added types, not specifying one gave you a machine word - an int.
(It was very important at the time to save space, so not requiring the most common type to be spelled out was a Very Good Thing.)
So, since a B program started with
main()
and programmers are generally language-conservative, C did the same and returned an int.

There are two reasons I would not consider this a waste:
1 practical use of 4 byte exit code
If you want to return an exit code that exactly describes an error you want more than 8 bit.
As an example you may want to group errors: the first byte could describe the vague type of error, the second byte could describe the function that caused the error, the third byte could give information about the cause of the error and the fourth byte describes additional debug information.
2 Padding
If you pass a single short or char they will still be aligned to fit into a machine word, which is often 4 Byte/32 bit depending on architecture. This is called padding and means, that you will most likely still need 32 bit of memory to return a single short or char.

The old-fashioned convention with most shells is to use the least significant 8 bits of int, not just 0 or 1. 16 bits is increasingly common due to that being the minimum size of an int allowed by the standard.
And what would the issue be with wasting space? Is the space really wasted? Is your computer so full of "stuff" that the remaining sizeof(int) * CHAR_BIT - 8 would make a difference? Could the architecture exploit that and use those remaining bits for something else? I very much doubt it.
So I wouldn't say the memory is at all wasted since you get it back from the operating system when the program finishes. Perhaps extravagent? A bit like using a large wine glass for a small tipple perhaps?

1st: Alone your assumption/statement if it usually returns only 0 or 1 is wrong.
Usually the return code is expected to be 0 if no error occurred but otherwise it can return any number to represent different errors. And most (at least command line programs) do so. Many programs also output negative numbers.
However there are a few common used codes https://www.tldp.org/LDP/abs/html/exitcodes.html also here another SO member points to a unix header that contains some codes https://stackoverflow.com/a/24121322/2331592
So after all it is not just a C or C++ type thing but also has historical reasons how most operating systems work and expect the programs to behave and since that the languages have to support that and so at least C like languages do that by using an int main(...).
2nd:
your conclusion It is wasting memory space is wrong.
Using an int in comparison to a shorter type does not involve any waste.
Memory is usually handled in word-size (that that mean may depend from your architecture) anyway
working with sub-word-types involves computation overheand on some architecture (read: load, word, mask out unrelated bits; store: load memory, mask out variable bits, or them with the new value, write the word back)
the memory is not wasted unless you use it. if you write return 0; no memory is ever used at this point. if you return myMemorySaving8bitVar; you only have 1 byte used (most probable on the stack (if not optimized out at all))

You're either working in or learning C, so I think it's a Real Good Idea that you are concerned with efficiency. However, it seems that there are a few things that seem to need clarifying here.
First, the int data type is not an never was intended to mean "32 bits". The idea was that int would be the most natural binary integer type on the target machine--usually the size of a register.
Second, the return value from main() is meant to accommodate a wide range of implementations on different operating systems. A POSIX system uses an unsigned 8-bit return code. Windows uses 32-bits that are interpreted by the CMD shell as 2's complement signed. Another OS might choose something else.
And finally, if you're worried about memory "waste", that's an implementation issue that isn't even an issue in this case. Return codes from main are typically returned in machine registers, not in memory, so there is no cost or savings involved. Even if there were, saving 2 bytes in the run of a nontrivial program is not worth any developer's time.

The answer is "because it usually doesn't return only 0 or 1." I found this thread from software engineering community that at least partially answers your question. Here are the two highlights, first from the accepted answer:
An integer gives more room than a byte for reporting the error. It can be enumerated (return of 1 means XYZ, return of 2 means ABC, return of 3, means DEF, etc..) or used as flags (0x0001 means this failed, 0x0002 means that failed, 0x0003 means both this and that failed). Limiting this to just a byte could easily run out of flags (only 8), so the decision was probably to use an integer.
An interesting point is also raised by Keith Thompson:
For example, in the dialect of C used in the Plan 9 operating system main is normally declared as a void function, but the exit status is returned to the calling environment by passing a string pointer to the exits() function. The empty string denotes success, and any non-empty string denotes some kind of failure. This could have been implemented by having main return a char* result.
Here's another interesting bit from a unix.com forum:
(Some of the following may be x86 specific.)
Returning to the original question: Where is the exit status stored? Inside the kernel.
When you call exit(n), the least significant 8 bits of the integer n are written to a cpu register. The kernel system call implementation will then copy it to a process-related data structure.
What if your code doesn't call exit()? The c runtime library responsible for invoking main() will call exit() (or some variant thereof) on your behalf. The return value of main(), which is passed to the c runtime in a register, is used as the argument to the exit() call.
Related to the last quote, here's another from cppreference.com
5) Execution of the return (or the implicit return upon reaching the end of main) is equivalent to first leaving the function normally (which destroys the objects with automatic storage duration) and then calling std::exit with the same argument as the argument of the return. (std::exit then destroys static objects and terminates the program)
Lastly, I found this really cool example here (although the author of the post is wrong in saying that the result returned is the returned value modulo 512). After compiling and executing the following:
int main() {
return 42001;
}
on a POSIX compliant my* system, echo $? returns 17. That is because 42001 % 256 == 17 which shows that 8 bits of data are actually used. With that in mind, choosing int ensures that enough storage is available for passing the program's exit status information, because, as per this answer, compliance to the C++ standard guarantees that size of int (in bits)
can't be less than 8. That's because it must be large enough to hold "the eight-bit code units of the Unicode UTF-8 encoding form."
EDIT:
*As Andrew Henle pointed out in the comment:
A fully POSIX compliant system makes the entire int return value available, not just 8 bits. See pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html: "If si_code is equal to CLD_EXITED, then si_status holds the exit value of the process; otherwise, it is equal to the signal that caused the process to change state. The exit value in si_status shall be equal to the full exit value (that is, the value passed to _exit(), _Exit(), or exit(), or returned from main()); it shall not be limited to the least significant eight bits of the value."
I think this makes for an even stronger argument for the use of int over data types of smaller sizes.

Related

C++ Question about memory pretty basic but this is confusing me

Not really too c++ related I guess but say I have a signed int
int a =50;
This sets aside like 32 bits memory for this right it'll get some bit patternand a memory address, now my question is basically well we created this variable but the computer ITSELF doesn't know what the type is it just sees some bit pattern and memory address it doesn't know this is an int, but my question is how? Does the computer know that those 4 bytes are all connected to a? And also how does the computer not know the type? It set aside 4 bytes for one variable I get that that doesn't automatically make it an int but does the computer really know nothing? The value was 50 the number and that gets turned into binary and stored in the bit pattern how does the computer know nothing
Type information is used by the compiler. It knows the size in bytes of each type and will create an executable that at runtime will correctly access memory for each variable.
C++ is a compiled language and is not run directly by the computer. The computer itself runs machine code. Since machine code is all binary, we will often look at what is called assembly language. This language uses human readable symbols to represent the machine code but also corresponds to it on a line by line basis
When the compiler sees int a = 50; It might (depending on architecture and compiler)
mov r1 #50 // move the literal 50 into the register r1
Registers are placeholders in the CPU that the cpu can use to manipulate memory. In the above statement the compiler will remember that whenever it wants to translate a statement that uses a into machine code, it needs to fetch the value from r1
In the above case, the computer decided to map a to a register, it may well use memory instead.
The type itself is not translated into machine code, it is rather used as a hint as to what types of assembly operations subsequence c++ statements will be translated into. The CPU itself does not understand types. Size is used when loading and storing values from memory to registers. With a char type one 1 byte is read, with a short 2 and an int, typically 4.
When it comes to signedness, the cpu has different instructions for signed and unsigned comparisons.
Lastly float either have to simulated using integer maths or special floating point assembler instructions need to be used.
So once translated into assembler, there is no easy way to know what the original C++ code was.

Is `reinterpret_cast<char*>(reinterpret_cast<uintptr_t>(&ch) + 1) == &ch +1` guaranteed?

I'm writing alignment-dependent code, and quite surprised that there's no standard function testing if a given pointer is aligned properly.
It seems that most code on the internet use (long)ptr or reinterpret_cast<uintptr_t>(ptr) to test the alignment and I also used them, but I wonder if using the casted pointer to integral type is standard-conformant.
Is there any system that fires the assertion here?
char ch[2];
assert(reinterpret_cast<char*>(reinterpret_cast<uintptr_t>(&ch[0]) + 1)
== &ch[1]);
To answer the question in the title: No.
Counter example: On the old Pr1me mini-computer, a normal pointer was two 16-bit words. First word was 12-bit segment number, 2 ring bits, and a flag bit (can't remember the 16th bit). Second word was a 16-bit word offset within a segment. A char* (and hence void*) needed a third word. If the flag bit was set, the third word was either 0 or 8 (being the bit offset within the addressed word). A uintptr_t for such a machine would need to be uint48_t or uint64_t. Either way, adding 1 to such an integer would not advance to the next character in memory.
A capability addressed machine is also likely to have pointers which are much larger than the address space, and there is no particular reason why the least significant part of the corresponding integer should be part of the "address" rather than part of the extra info.
In practise of course, nobody is writing C++ for a Pr1me, and capability addressed machines seem not to have appeared either. It will work on all real systems - but the standard doesn't guarantee it.

Why QVector::size returns int?

std::vector::size() returns a size_type which is unsigned and usually the same as size_t, e.g. it is 8 bytes on 64bit platforms.
In constrast, QVector::size() returns an int which is usually 4 bytes even on 64bit platforms, and at that it is signed, which means it can only go half way to 2^32.
Why is that? This seems quite illogical and also technically limiting, and while it is nor very likely that you may ever need more than 2^32 number of elements, the usage of signed int cuts that range in half for no apparent good reason. Perhaps to avoid compiler warnings for people too lazy to declare i as a uint rather than an int who decided that making all containers return a size type that makes no sense is a better solution? The reason could not possibly be that dumb?
This has been discussed several times since Qt 3 at least and the QtCore maintainer expressed that a while ago no change would happen until Qt 7 if it ever does.
When the discussion was going on back then, I thought that someone would bring it up on Stack Overflow sooner or later... and probably on several other forums and Q/A, too. Let us try to demystify the situation.
In general you need to understand that there is no better or worse here as QVector is not a replacement for std::vector. The latter does not do any Copy-On-Write (COW) and that comes with a price. It is meant for a different use case, basically. It is mostly used inside Qt applications and the framework itself, initially for QWidgets in the early times.
size_t has its own issue, too, after all that I will indicate below.
Without me interpreting the maintainer to you, I will just quote Thiago directly to carry the message of the official stance on:
For two reasons:
1) it's signed because we need negative values in several places in the API:
indexOf() returns -1 to indicate a value not found; many of the "from"
parameters can take negative values to indicate counting from the end. So even
if we used 64-bit integers, we'd need the signed version of it. That's the
POSIX ssize_t or the Qt qintptr.
This also avoids sign-change warnings when you implicitly convert unsigneds to
signed:
-1 + size_t_variable => warning
size_t_variable - 1 => no warning
2) it's simply "int" to avoid conversion warnings or ugly code related to the
use of integers larger than int.
io/qfilesystemiterator_unix.cpp
size_t maxPathName = ::pathconf(nativePath.constData(), _PC_NAME_MAX);
if (maxPathName == size_t(-1))
io/qfsfileengine.cpp
if (len < 0 || len != qint64(size_t(len))) {
io/qiodevice.cpp
qint64 QIODevice::bytesToWrite() const
{
return qint64(0);
}
return readSoFar ? readSoFar : qint64(-1);
That was one email from Thiago and then there is another where you can find some detailed answer:
Even today, software that has a core memory of more than 4 GB (or even 2 GB)
is an exception, rather than the rule. Please be careful when looking at the
memory sizes of some process tools, since they do not represent actual memory
usage.
In any case, we're talking here about having one single container addressing
more than 2 GB of memory. Because of the implicitly shared & copy-on-write
nature of the Qt containers, that will probably be highly inefficient. You need
to be very careful when writing such code to avoid triggering COW and thus
doubling or worse your memory usage. Also, the Qt containers do not handle OOM
situations, so if you're anywhere close to your memory limit, Qt containers
are the wrong tool to use.
The largest process I have on my system is qtcreator and it's also the only
one that crosses the 4 GB mark in VSZ (4791 MB). You could argue that it is an
indication that 64-bit containers are required, but you'd be wrong:
Qt Creator does not have any container requiring 64-bit sizes, it simply
needs 64-bit pointers
It is not using 4 GB of memory. That's just VSZ (mapped memory). The total
RAM currently accessible to Creator is merely 348.7 MB.
And it is using more than 4 GB of virtual space because it is a 64-bit
application. The cause-and-effect relationship is the opposite of what you'd
expect. As a proof of this, I checked how much virtual space is consumed by
padding: 800 MB. A 32-bit application would never do that, that's 19.5% of the
addressable space on 4 GB.
(padding is virtual space allocated but not backed by anything; it's only
there so that something else doesn't get mapped to those pages)
Going into this topic even further with Thiago's responses, see this:
Personally, I'm VERY happy that Qt collection sizes are signed. It seems
nuts to me that an integer value potentially used in an expression using
subtraction be unsigned (e.g. size_t).
An integer being unsigned doesn't guarantee that an expression involving
that integer will never be negative. It only guarantees that the result
will be an absolute disaster.
On the other hand, the C and C++ standards define the behaviour of unsigned
overflows and underflows.
Signed integers do not overflow or underflow. I mean, they do because the types
and CPU registers have a limited number of bits, but the standards say they
don't. That means the compiler will always optimise assuming you don't over-
or underflow them.
Example:
for (int i = 1; i >= 1; ++i)
This is optimised to an infinite loop because signed integers do not overflow.
If you change it to unsigned, then the compiler knows that it might overflow
and come back to zero.
Some people didn't like that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475
unsigned numbers are values mod 2^n for some n.
Signed numbers are bounded integers.
Using unsigned values as approximations for 'positive integers' runs into the problem that common values are near the edge of the domain where unsigned values behave differently than plain integers.
The advantage is that unsigned approximation reaches higher positive integers, and under/overflow are well defined (if random when looked at as a model of Z).
But really, ptrdiff_t would be better than int.

Where the C++ literal-constant storage in memory?

Where the C++ literal-constant storage in memory? stack or heap?
int *p = &2 is wrong. I want know why? Thanks
-------------------------------------------------
My question is "Where the C++ literal-constant storage in memory", "int *p = &2 is wrong",not my question.
The details depend on the machine, but assuming a commonest sort of machine and operating system... every executable file contains several "segments" - CODE, BSS, DATA and some others.
CODE holds all the executable opcodes. Actually, it's often named TEXT because somehow that made sense to people way back decades ago. Normally it's read-only.
BSS is uninitialized data - it actually doesn't need to exist in the executable file, but is allocated by the operating system's loader when the program is starting to run.
DATA holds the literal constants - the int8, int16, int32 etc along with floats, string literals, and whatever weird things the compiler and linker care to produce. This is what you're asking about. However, it holds only constants defined for use as variables, as in
const long x = 2;
but unlikely to hold literal constants used in your source code but not tightly associated with a variable. Just a lone '2' is dealt with directly by the compiler. For example in C,
print("%d", 2);
would cause the compiler to build a subroutine call to print(), writing opcodes to push a pointer to the string literal "%d" and the value 2, both as 64-bit integers on a 64-bit machine (you're not one of those laggards still using 32-bit hardware, are you? :) followed by the opcode to jump to a subroutine at (identifier for 'print' subroutine).
The "%d" literal goes into DATA. The 2 doesn't; it's built into the opcode that stuffs integers onto the stack. That might actually be a "load register RAX immediate" followed by the value 2, followed by a "push register RAX", or maybe a single opcode can do the job. So in the final executable file, the 2 will be found in the CODE (aka TEXT) segment.
It typically isn't possible to make a pointer to that value, or to any opcode. It just doesn't make sense in terms of what high level languages like C do (and C is "high level" when you're talking about opcodes and segments.) "&2" can only be an error.
Now, it's not entirely impossible to have a pointer to opcodes. Whenever you define a function in C, or an object method, constructor or destructor in C++, the name of the function can be thought of as a pointer to the first opcode of the machine code compiled from that function. For example, print() without the parentheses is a pointer to a function. Maybe if your example code were in a function and you guess the right offset, pointer arithmetic could be used to point to that "immediate" value 2 nestled among the opcodes, but this is not going to be easy for any contemporary CPU, and certainly isn't for beginners.
Let me quote relevant clauses of C++03 Standard.
5.3.1/2
The result of the unary & operator is a pointer to its operand. The
operand shall be an lvalue.
An integer literal is an rvalue (however, I haven't found a direct quote in C++03 Standard, but C++11 mentiones that as a side note in 3.10/1).
Therefore, it's not possible to take an address of an integer literal.
What about the exact place where 2 is stored, it depends on usage. It might be a part of an machine instruction, or it might be optimized away, e.g. j=i*2 might become j=i+i. You should not rely upon it.
You have two questions:
Where are literal constants stored? With the exception of string
literals (which are actual objects), pretty much wherever the
implementation wants. It will usually depend on what you're doing with
them, but on a lot of architectures, integral constants (and often some
special floating point constants, like 0.0) will end up as part of a
machine instruction. When this isn't possible, they'll usually be
placed in the same logical segment as the code.
As to why taking the address of an rvalue is illegal, the main reason is
because the standard says so. Historically, it's forbidden because such
constants often never exist as a separate object in memory, and thus
have no address. Today... one could imagine other solutions: compilers
are smart enough to put them in memory if you took their address, and
not otherwise; and rvalues of class type do have a memory address.
The rules are somewhat arbitrary (and would be, regardless of what they
were)—hopefully, any rules which would allow taking the address of
a literal would make its type int const*, and not int*.

C++ : why bool is 8 bits long?

In C++, I'm wondering why the bool type is 8 bits long (on my system), where only one bit is enough to hold the boolean value ?
I used to believe it was for performance reasons, but then on a 32 bits or 64 bits machine, where registers are 32 or 64 bits wide, what's the performance advantage ?
Or is it just one of these 'historical' reasons ?
Because every C++ data type must be addressable.
How would you create a pointer to a single bit? You can't. But you can create a pointer to a byte. So a boolean in C++ is typically byte-sized. (It may be larger as well. That's up to the implementation. The main thing is that it must be addressable, so no C++ datatype can be smaller than a byte)
Memory is byte addressable. You cannot address a single bit, without shifting or masking the byte read from memory. I would imagine this is a very large reason.
A boolean type normally follows the smallest unit of addressable memory of the target machine (i.e. usually the 8bits byte).
Access to memory is always in "chunks" (multiple of words, this is for efficiency at the hardware level, bus transactions): a boolean bit cannot be addressed "alone" in most CPU systems. Of course, once the data is contained in a register, there are often specialized instructions to manipulate bits independently.
For this reason, it is quite common to use techniques of "bit packing" in order to increase efficiency in using "boolean" base data types. A technique such as enum (in C) with power of 2 coding is a good example. The same sort of trick is found in most languages.
Updated: Thanks to a excellent discussion, it was brought to my attention that sizeof(char)==1 by definition in C++. Hence, addressing of a "boolean" data type is pretty tied to the smallest unit of addressable memory (reinforces my point).
The answers about 8-bits being the smallest amount of memory that is addressable are correct. However, some languages can use 1-bit for booleans, in a way. I seem to remember Pascal implementing sets as bit strings. That is, for the following set:
{1, 2, 5, 7}
You might have this in memory:
01100101
You can, of course, do something similar in C / C++ if you want. (If you're keeping track of a bunch of booleans, it could make sense, but it really depends on the situation.)
I know this is old but I thought I'd throw in my 2 cents.
If you limit your boolean or data type to one bit then your application is at risk for memory curruption. How do you handle error stats in memory that is only one bit long?
I went to a job interview and one of the statements the program lead said to me was, "When we send the signal to launch a missle we just send a simple one bit on off bit via wireless. Sending one bit is extremelly fast and we need that signal to be as fast as possible."
Well, it was a test to see if I understood the concepts and bits, bytes, and error handling. How easy would it for a bad guy to send out a one bit msg. Or what happens if during transmittion the bit gets flipped the other way.
Some embedded compilers have an int1 type that is used to bit-pack boolean flags (e.g. CCS series of C compilers for Microchip MPU's). Setting, clearing, and testing these variables uses single-instruction bit-level instructions, but the compiler will not permit any other operations (e.g. taking the address of the variable), for the reasons noted in other answers.
Note, however, that std::vector<bool> is allowed to use bit-packing, i.e. to store the bits in smaller units than an ordinary bool. But it is not required.