A unique type of data conversion - c++

in the following code
tt=5;
for(i=0;i<tt;i++)
{
int c,d,l;
scanf("%lld%lld%lld",&c,&d,&l);
printf("%d %d %d %d",c,d,l,tt);
}
in the first iteration, the value of 'tt' is changing to 0 automatically.
I know that i have declared c,d,l as int and taking input as long long so it is making c,d=0. But still, i m not able to understand how tt is becoming 0.

Small, but obligatory announcement. As it was said in comments, you face undefined behavior, so
don't be surprised by tt assigned to zero
don't be surprised by tt not assigned to zero after insignificant code changes (e.g. reordering initialization from "int i,tt;" to "int tt, i;" or vice versa)
don't be surprised by tt not assigned to zero after compiling with different flags or different compiler version or for different platform or for testing with different input
don't be surprised by anything. Any behavior is possible.
You can't expect this code to work one way or another, so don't ever use it in real program.
However, you seem to be OK with that, and the question is "what is actually happening with tt". IMHO this question is really great, it reveals passion to understand programming deeper, and it helps in digging into lower layer. So lets get started.
Possible explanation
I failed to reproduce behavior on VS2015, but situation is quite clear. Actual data aligning, variable sizes, endianness, stack growth direction and other details may differ on your PC, but the general idea should be the same.
Variables i, tt, c, d, l are local, so they are stored on stack. Lets assume, sizeof(int) is 4 and sizeof(long long) is 8 which is quite common. Then one of possible data alignments is shown on picture (addresses grow from left to right, each cell represents one byte):
When doing scanf, you pass address of c (blue arrow on next pict) for filling with data. But size of data is 8 bytes, so data of both c and tt are overwritten (blue cells on the pict). For little-endian representation, you always write zeroes to tt unless really big number is entered by user, while c actually gets valid data for small numbers.
However, valid data in c will be rewritten the same way during filling d, the same will happen to d while filling l. So only l will get nonzero value in described case. Easy test: enter large number for c, d, l and check if tt is still zero.
How to get precise answer
You can get all answers from assembly code. Enable disassembly listing (exact steps depend on toolchain: gcc has -S option, visual studio has "goto disassembly" item in context menu while on breakpoint) and analyze listing. It's really helpful to see exact instructions your CPU is going to execute. Some debuggers allow executing instructions one by one. So you need to find out how variables are alligned on stack and when exactly are they overwritten. Analyzing scanf is hard for beginners, so you can start with the simplified version of your program: replace scanf with the following (can't test, but should work):
*((long long *)(&c)) = 1; //or any other user specified value
*((long long *)(&d)) = 2;
*((long long *)(&l)) = 3;

Related

C++ Question about memory pretty basic but this is confusing me

Not really too c++ related I guess but say I have a signed int
int a =50;
This sets aside like 32 bits memory for this right it'll get some bit patternand a memory address, now my question is basically well we created this variable but the computer ITSELF doesn't know what the type is it just sees some bit pattern and memory address it doesn't know this is an int, but my question is how? Does the computer know that those 4 bytes are all connected to a? And also how does the computer not know the type? It set aside 4 bytes for one variable I get that that doesn't automatically make it an int but does the computer really know nothing? The value was 50 the number and that gets turned into binary and stored in the bit pattern how does the computer know nothing
Type information is used by the compiler. It knows the size in bytes of each type and will create an executable that at runtime will correctly access memory for each variable.
C++ is a compiled language and is not run directly by the computer. The computer itself runs machine code. Since machine code is all binary, we will often look at what is called assembly language. This language uses human readable symbols to represent the machine code but also corresponds to it on a line by line basis
When the compiler sees int a = 50; It might (depending on architecture and compiler)
mov r1 #50 // move the literal 50 into the register r1
Registers are placeholders in the CPU that the cpu can use to manipulate memory. In the above statement the compiler will remember that whenever it wants to translate a statement that uses a into machine code, it needs to fetch the value from r1
In the above case, the computer decided to map a to a register, it may well use memory instead.
The type itself is not translated into machine code, it is rather used as a hint as to what types of assembly operations subsequence c++ statements will be translated into. The CPU itself does not understand types. Size is used when loading and storing values from memory to registers. With a char type one 1 byte is read, with a short 2 and an int, typically 4.
When it comes to signedness, the cpu has different instructions for signed and unsigned comparisons.
Lastly float either have to simulated using integer maths or special floating point assembler instructions need to be used.
So once translated into assembler, there is no easy way to know what the original C++ code was.

Memory waste? If main() should only return 0 or 1, why is main declared with int and not short int or even char?

For example:
#include <stdio.h>
int main (void) /* Why int and not short int? - Waste of Memory */
{
printf("Hello World!");
return 0;
}
Why main() is conventional defined with int type, which allocates 4 bytes in memory on 32-bit, if it usually returns only 0 or 1, while other types such as short int (2 bytes,32-bit) or even char (1 byte,32-bit) would be more memory saving?
It is wasting memory space.
NOTE: The question is not a duplicate of the thread given; its answers only correspond to the return value itself but not its datatype at explicit focus.
The Question is for C and C++. If the answers between those alter, share your wisdom with the mention of the context of which language in particular is focused.
Usually assemblers use their registers to return a value (for example the register AX in Intel processors). The type int corresponds to the machine word That is, it is not required to convert, for example, a byte that corresponds to the type char to the machine word.
And in fact, main can return any integer value.
It's because of a machine that's half a century old.
Back in the day when C was created, an int was a machine word on the PDP-11 - sixteen bits - and it was natural and efficient to have main return that.
The "machine word" was the only type in the B language, which Ritchie and Thompson had developed earlier, and which C grew out of.
When C added types, not specifying one gave you a machine word - an int.
(It was very important at the time to save space, so not requiring the most common type to be spelled out was a Very Good Thing.)
So, since a B program started with
main()
and programmers are generally language-conservative, C did the same and returned an int.
There are two reasons I would not consider this a waste:
1 practical use of 4 byte exit code
If you want to return an exit code that exactly describes an error you want more than 8 bit.
As an example you may want to group errors: the first byte could describe the vague type of error, the second byte could describe the function that caused the error, the third byte could give information about the cause of the error and the fourth byte describes additional debug information.
2 Padding
If you pass a single short or char they will still be aligned to fit into a machine word, which is often 4 Byte/32 bit depending on architecture. This is called padding and means, that you will most likely still need 32 bit of memory to return a single short or char.
The old-fashioned convention with most shells is to use the least significant 8 bits of int, not just 0 or 1. 16 bits is increasingly common due to that being the minimum size of an int allowed by the standard.
And what would the issue be with wasting space? Is the space really wasted? Is your computer so full of "stuff" that the remaining sizeof(int) * CHAR_BIT - 8 would make a difference? Could the architecture exploit that and use those remaining bits for something else? I very much doubt it.
So I wouldn't say the memory is at all wasted since you get it back from the operating system when the program finishes. Perhaps extravagent? A bit like using a large wine glass for a small tipple perhaps?
1st: Alone your assumption/statement if it usually returns only 0 or 1 is wrong.
Usually the return code is expected to be 0 if no error occurred but otherwise it can return any number to represent different errors. And most (at least command line programs) do so. Many programs also output negative numbers.
However there are a few common used codes https://www.tldp.org/LDP/abs/html/exitcodes.html also here another SO member points to a unix header that contains some codes https://stackoverflow.com/a/24121322/2331592
So after all it is not just a C or C++ type thing but also has historical reasons how most operating systems work and expect the programs to behave and since that the languages have to support that and so at least C like languages do that by using an int main(...).
2nd:
your conclusion It is wasting memory space is wrong.
Using an int in comparison to a shorter type does not involve any waste.
Memory is usually handled in word-size (that that mean may depend from your architecture) anyway
working with sub-word-types involves computation overheand on some architecture (read: load, word, mask out unrelated bits; store: load memory, mask out variable bits, or them with the new value, write the word back)
the memory is not wasted unless you use it. if you write return 0; no memory is ever used at this point. if you return myMemorySaving8bitVar; you only have 1 byte used (most probable on the stack (if not optimized out at all))
You're either working in or learning C, so I think it's a Real Good Idea that you are concerned with efficiency. However, it seems that there are a few things that seem to need clarifying here.
First, the int data type is not an never was intended to mean "32 bits". The idea was that int would be the most natural binary integer type on the target machine--usually the size of a register.
Second, the return value from main() is meant to accommodate a wide range of implementations on different operating systems. A POSIX system uses an unsigned 8-bit return code. Windows uses 32-bits that are interpreted by the CMD shell as 2's complement signed. Another OS might choose something else.
And finally, if you're worried about memory "waste", that's an implementation issue that isn't even an issue in this case. Return codes from main are typically returned in machine registers, not in memory, so there is no cost or savings involved. Even if there were, saving 2 bytes in the run of a nontrivial program is not worth any developer's time.
The answer is "because it usually doesn't return only 0 or 1." I found this thread from software engineering community that at least partially answers your question. Here are the two highlights, first from the accepted answer:
An integer gives more room than a byte for reporting the error. It can be enumerated (return of 1 means XYZ, return of 2 means ABC, return of 3, means DEF, etc..) or used as flags (0x0001 means this failed, 0x0002 means that failed, 0x0003 means both this and that failed). Limiting this to just a byte could easily run out of flags (only 8), so the decision was probably to use an integer.
An interesting point is also raised by Keith Thompson:
For example, in the dialect of C used in the Plan 9 operating system main is normally declared as a void function, but the exit status is returned to the calling environment by passing a string pointer to the exits() function. The empty string denotes success, and any non-empty string denotes some kind of failure. This could have been implemented by having main return a char* result.
Here's another interesting bit from a unix.com forum:
(Some of the following may be x86 specific.)
Returning to the original question: Where is the exit status stored? Inside the kernel.
When you call exit(n), the least significant 8 bits of the integer n are written to a cpu register. The kernel system call implementation will then copy it to a process-related data structure.
What if your code doesn't call exit()? The c runtime library responsible for invoking main() will call exit() (or some variant thereof) on your behalf. The return value of main(), which is passed to the c runtime in a register, is used as the argument to the exit() call.
Related to the last quote, here's another from cppreference.com
5) Execution of the return (or the implicit return upon reaching the end of main) is equivalent to first leaving the function normally (which destroys the objects with automatic storage duration) and then calling std::exit with the same argument as the argument of the return. (std::exit then destroys static objects and terminates the program)
Lastly, I found this really cool example here (although the author of the post is wrong in saying that the result returned is the returned value modulo 512). After compiling and executing the following:
int main() {
return 42001;
}
on a POSIX compliant my* system, echo $? returns 17. That is because 42001 % 256 == 17 which shows that 8 bits of data are actually used. With that in mind, choosing int ensures that enough storage is available for passing the program's exit status information, because, as per this answer, compliance to the C++ standard guarantees that size of int (in bits)
can't be less than 8. That's because it must be large enough to hold "the eight-bit code units of the Unicode UTF-8 encoding form."
EDIT:
*As Andrew Henle pointed out in the comment:
A fully POSIX compliant system makes the entire int return value available, not just 8 bits. See pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html: "If si_code is equal to CLD_EXITED, then si_status holds the exit value of the process; otherwise, it is equal to the signal that caused the process to change state. The exit value in si_status shall be equal to the full exit value (that is, the value passed to _exit(), _Exit(), or exit(), or returned from main()); it shall not be limited to the least significant eight bits of the value."
I think this makes for an even stronger argument for the use of int over data types of smaller sizes.

Which to use? int32_t vs uint32_t [duplicate]

When is it appropriate to use an unsigned variable over a signed one? What about in a for loop?
I hear a lot of opinions about this and I wanted to see if there was anything resembling a consensus.
for (unsigned int i = 0; i < someThing.length(); i++) {
SomeThing var = someThing.at(i);
// You get the idea.
}
I know Java doesn't have unsigned values, and that must have been a concious decision on Sun Microsystems' part.
I was glad to find a good conversation on this subject, as I hadn't really given it much thought before.
In summary, signed is a good general choice - even when you're dead sure all the numbers are positive - if you're going to do arithmetic on the variable (like in a typical for loop case).
unsigned starts to make more sense when:
You're going to do bitwise things like masks, or
You're desperate to to take advantage of the sign bit for that extra positive range .
Personally, I like signed because I don't trust myself to stay consistent and avoid mixing the two types (like the article warns against).
In your example above, when 'i' will always be positive and a higher range would be beneficial, unsigned would be useful. Like if you're using 'declare' statements, such as:
#declare BIT1 (unsigned int 1)
#declare BIT32 (unsigned int reallybignumber)
Especially when these values will never change.
However, if you're doing an accounting program where the people are irresponsible with their money and are constantly in the red, you will most definitely want to use 'signed'.
I do agree with saint though that a good rule of thumb is to use signed, which C actually defaults to, so you're covered.
I would think that if your business case dictates that a negative number is invalid, you would want to have an error shown or thrown.
With that in mind, I only just recently found out about unsigned integers while working on a project processing data in a binary file and storing the data into a database. I was purposely "corrupting" the binary data, and ended up getting negative values instead of an expected error. I found that even though the value converted, the value was not valid for my business case.
My program did not error, and I ended up getting wrong data into the database. It would have been better if I had used uint and had the program fail.
C and C++ compilers will generate a warning when you compare signed and unsigned types; in your example code, you couldn't make your loop variable unsigned and have the compiler generate code without warnings (assuming said warnings were turned on).
Naturally, you're compiling with warnings turned all the way up, right?
And, have you considered compiling with "treat warnings as errors" to take it that one step further?
The downside with using signed numbers is that there's a temptation to overload them so that, for example, the values 0->n are the menu selection, and -1 means nothing's selected - rather than creating a class that has two variables, one to indicate if something is selected and another to store what that selection is. Before you know it, you're testing for negative one all over the place and the compiler is complaining about how you're wanting to compare the menu selection against the number of menu selections you have - but that's dangerous because they're different types. So don't do that.
size_t is often a good choice for this, or size_type if you're using an STL class.

Is it bad practice to operate on a structure and assign the result to the same structure? Why?

I don't recall seeing examples of code like this hypothetical snippet:
cpu->dev.bus->uevent = (cpu->dev.bus->uevent) >> 16; //or the equivalent using a macro
in which a member in a large structure gets dereferenced using pointers, operated on, and the result assigned back to the same field of the structure.
The kernel seems to be a place where such large structures are frequent but I haven't seen examples of it and became interested as to the reason why.
Is there a performance reason for this, maybe related to the time required to follow the pointers? Is it simply not good style and if so, what is the preferred way?
There's nothing wrong with the statement syntactically, but it's easier to code it like this:
cpu->dev.bus->uevent >>= 16;
It's mush more a matter of history: the kernel is mostly written in C (not C++), and -in the original development intention- (K&R era) was thought as a "high level assembler", whose statement and expression should have a literal correspondence in C and ASM. In this environment, ++i i+=1 and i=i+1 are completely different things that translates in completely different CPU instructions
Compiler optimizations, at that time, where not so advanced and popular, so the idea to follow the pointer chain twice was often avoided by first store the resulting destination address in a local temporary variable (most likely a register) and than do the assignment.
(like int* p = &a->b->c->d; *p = a + *p;)
or trying to use compond instruction like a->b->c >>= 16;)
With nowadays computers (multicore processor, multilevel caches and piping) the execution of cone inside registers can be ten times faster respect to the memory access, following three pointers is faster than storing an address in memory, thus reverting the priority of the "business model".
Compiler optimization, then, can freely change the produced code to adequate it to size or speed depending on what is retained more important and depending on what kind of processor you are working with.
So -nowadays- it doesn't really matter if you write ++i or i+=1 or i=i+1: The compiler will most likely produce the same code, attempting to access i only once. and following the pointer chain twice will most likely be rewritten as equivalent to (cpu->dev.bus->uevent) >>= 16 since >>= correspond to a single machine instruction in the x86 derivative processors.
That said ("it doesn't really matter"), it is also true that code style tend to reflect stiles and fashions of the age it was first written (since further developers tend to maintain consistency).
You code is not "bad" by itself, it just looks "odd" in the place it is usually written.
Just to give you an idea of what piping and prediction is. consider the comparison of two vectors:
bool equal(size_t n, int* a, int *b)
{
for(size_t i=0; i<n; ++i)
if(a[i]!=b[i]) return false;
return true;
}
Here, as soon we find something different we sortcut saying they are different.
Now consider this:
bool equal(size_t n, int* a, int *b)
{
register size_t c=0;
for(register size_t i=0; i<n; ++i)
c+=(a[i]==b[i]);
return c==n;
}
There is no shortcut, and even if we find a difference continue to loop and count.
But having removed the if from inside the loop, if n isn't that big (let's say less that 20) this can be 4 or 5 times faster!
An optimized compiler can even recognize this situation - proven there are no different side effects- can rework the first code in the second!
I see nothing wrong with something like that, it appears as innocuous as:
i = i + 42;
If you're accessing the data items a lot, you could consider something like:
tSomething *cdb = cpu->dev.bus;
cdb->uevent = cdb->uevent >> 16;
// and many more accesses to cdb here
but, even then, I'd tend to leave it to the optimiser, which tends to do a better job than most humans anyway :-)
There's nothing inherently wrong by doing
cpu->dev.bus->uevent = (cpu->dev.bus->uevent) >> 16;
but depending on the type of uevent, you need to be careful when shifting right like that, so you don't accidentally shift in unexpected bits into your value. For instance, if it's a 64-bit value
uint64_t uevent = 0xDEADBEEF00000000;
uevent = uevent >> 16; // now uevent is 0x0000DEADBEEF0000;
if you thought you shifted a 32-bit value and then pass the new uevent to a function taking a 64-bit value, you're not passing 0xBEEF0000, as you might have expected. Since the sizes fit (64-bit value passed as 64-bit parameter), you won't get any compiler warnings here (which you would have if you passed a 64-bit value as a 32-bit parameter).
Also interesting to note is that the above operation, while similar to
i = ++i;
which is undefined behavior (see http://josephmansfield.uk/articles/c++-sequenced-before-graphs.html for details), is still well defined, since there are no side effects in the right-hand side expression.

Initialize a variable

Is it better to declare and initialize the variable or just declare it?
What's the best and the most efficient way?
For example, I have this code:
#include <stdio.h>
int main()
{
int number = 0;
printf("Enter with a number: ");
scanf("%d", &number);
if(number < 0)
number= -number;
printf("The modulo is: %d\n", number);
return 0;
}
If I don't initialize number, the code works fine, but I want to know, is it faster, better, more efficient? Is it good to initialize the variable?
scanf can fail, in which case nothing is written to number. So if you want your code to be correct you need to initialize it (or check the return value of scanf).
The speed of incorrect code is usually irrelevant, but for you example code if there is a difference in speed at all then I doubt you would ever be able to measure it. Setting an int to 0 is much faster than I/O.
Don't attribute speed to language; That attribute belongs to implementations of language. There are fast implementations and slow implementations. There are optimisations assosciated with fast implementations; A compiler that produces well-optimised machine code would optimise the initialisation away if it can deduce that it doesn't need the initialisation.
In this case, it actually does need the initialisation. Consider if scanf were to fail. When scanf fails, it's return value reflects this failure. It'll either return:
A value less than zero if there was a read error or EOF (which can be triggered in an implementation-defined way, typically CTRL+Z on Windows and CTRL+d on Linux),
A number less than the number of objects provided to scanf (since you've provided only one object, this failure return value would be 0) when a conversion failure occurs (for example, entering 'a' on stdin when you've told scanf to convert sequences of '0'..'9' into an integer),
The number of objects scanf managed to assign to. This is 1, in your case.
Since you aren't checking for any of these return values (particular #3), your compiler can't deduce that the initialisation is necessary and hence, can't optimise it away. When the variable is uninitialised, failure to check these return values results in undefined behaviour. A chicken might appear to be living, even when it is missing its head. It would be best to check the return value of scanf. That way, when your variable is uninitialised you can avoid using an uninitialised value, and when it isn't your compiler can optimise away the initialisations, presuming you handle erroneous return values by producing error messages rather than using the variable.
edit: On that topic of undefined behaviour, consider what happens in this code:
if(number < 0)
number= -number;
If number is -32768, and INT_MAX is 32767, then section 6.5, paragraph 5 of the C standard applies because -(-32768) isn't representable as an int.
Section 6.5, paragraph 5 says:
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
Suppose if you don't initialize a variable and your code is buggy.(e.g. you forgot to read number). Then uninitialized value of number is garbage and different run will output(or behave) different results.
But If you initialize all of your variables then it will produce constant result. An easy to trace error.
Yes, initialize steps will add extra steps in your code at low level. for example mov $0, 28(%esp) in your code at low level. But its one time task. doesn't kill your code efficiency.
So, always using initialization is a good practice!
With modern compilers, there isn't going to be any difference in efficiency. Coding style is the main consideration. In general, your code is more self-explanatory and less likely to have mistakes if you initialize all variables upon declaring them. In the case you gave, though, since the variable is effectively initialized by the scanf, I'd consider it better not to have a redundant initialization.
Before, you need to answer to this questions:
1) how many time is called this function? if you call 10.000.000 times, so, it's a good idea to have the best.
2) If I don't inizialize my variable, I'm sure that my code is safe and not throw any exception?
After, an int inizialization doesn't change so much in your code, but a string inizialization yes.
Be sure that you do all the controls, because if you have a not-inizialized variable your program is potentially buggy.
I can't tell you how many times I've seen simple errors because a programmer doesn't initialize a variable. Just two days ago there was another question on SO where the end result of the issue being faced was simply that the OP didn't initialize a variable and thus there were problems.
When you talk about "speed" and "efficiency" don't simply consider how much faster the code might compile or run (and in this case it's pretty much irrelevant anyway) but consider your debugging time when there's a simple mistake in the code do to the fact you didn't initialize a variable that very easily could have been.
Note also, my experience is when coding for larger corporations they will run your code through tools like coverity or klocwork which will ding you for uninitialized variables because they present a security risk.