Related
Who decides the sizeof any datatype or structure (depending on 32 bit or 64 bit)? The compiler or the processor? For example, sizeof(int) is 4 bytes for a 32 bit system whereas it's 8 bytes for a 64 bit system.
I also read that sizeof(int) is 4 bytes when compiled using both 32-bit and 64-bit compiler.
Suppose my CPU can run both 32-bit as well as 64-bit applications, who will play main role in deciding size of data the compiler or the processor?
It's ultimately the compiler. The compiler implementors can decide to emulate whatever integer size they see fit, regardless of what the CPU handles the most efficiently. That said, the C (and C++) standard is written such, that the compiler implementor is free to choose the fastest and most efficient way. For many compilers, the implementers chose to keep int as a 32 bit, although the CPU natively handles 64 bit ints very efficiently.
I think this was done in part to increase portability towards programs written when 32 bit machines were the most common and who expected an int to be 32 bits and no longer. (It could also be, as user user3386109 points out, that 32 bit data was preferred because it takes less space and therefore can be accessed faster.)
So if you want to make sure you get 64 bit ints, you use int64_t instead of int to declare your variable. If you know your value will fit inside of 32 bits or you don't care about size, you use int to let the compiler pick the most efficient representation.
As for the other datatypes such as struct, they are composed from the base types such as int.
It's not the CPU, nor the compiler, nor the operating system. It's all three at the same time.
The compiler can't just make things up. It has to adhere to the right ABI[1] that that the operating system provides. If structs and system calls provided by the operating system have types with certain sizes and alignment requirements the compiler isn't really free to make up its own reality unless the compiler developers want to reimplement wrapper functions for everything the operating system provides. Then the ABI of the operating system can't just be completely made up, it has to do what can be reasonably done on the CPU. And very often the ABI of one operating system will be very similar to other ABIs for other operating systems on the same CPU because it's easier to just be able to reuse the work they did (on compilers among other things).
In case of computers that support both 32 bit and 64 bit code there still needs to be work done by the operating system to support running programs in both modes (because the system has to provide two different ABIs). Some operating systems don't do it and on those you don't have a choice.
[1] ABI stands for Application Binary Interface. It's a set of rules for how a program interacts with the operating system. It defines how a program is stored on disk to be runnable by the operating system, how to do system calls, how to link with libraries, etc. But for being able to link to libraries for example, your program and the library have to agree on how to make function calls between your program an the library (and vice versa) and to be able to make function calls both the program and the library have to have the same idea of stack layout, register usage, function call conventions, etc. And for function calls you need to agree on what the parameters mean and that includes sizes, alignment and signedness of types.
It is strictly, 100%, entirely the compiler that decides the value of sizeof(int). It is not a combination of the system and the compiler. It is just the compiler (and the C/C++ language specifications).
If you develop iPad or iPhone apps you do the compiler runs on your Mac. The Mac and the iPhone/iPac use different processors. Nothing about your Mac tells the compiler what size should be used for int on the iPad.
The processor designer determines what registers and instructions are available, what the alignment rules for efficient access are, how big memory addresses are and so-on.
The C standard sets minimum requirements for the built-in types. "char" must be at least 8 bit, "short" and "int" must be at least 16 bit, "long" must be at least 32 bit and "long long" must be at least 64 bit. It also says that "char" must be equivilent to the smallest unit of memory the program can address and that the size ordering of the standard types must be maintained.
Other standards may also have an impact. For example version 2 of the "single Unix specification" says that int must be at least 32-bits.
Finally existing code has an impact. Porting is hard enough already, noone wants to make it any harder than they have to.
When porting an OS and compiler to a new CPU someone has to define what is known of as a "C ABI". This defines how binary code talks to each other including.
The size and alignment requirements of the built-in types.
The packing rules for structures (and hence what their size will be).
How parameters are passed and returned
How the stack is managed
In general once and ABI is defined for a combination of CPU family and OS it doesn't change much (sometimes the size of more obscure types like "long double" changes). Changing it brings a bunch of breakage for relatively little gain.
Similarly those porting an OS to a platform with similar characteristics to an existing one will usually choose the same sizes as on previous platforms that the OS was ported to.
In practice OS/compiler vendors typically settle on one of a few combinations of sizes for the basic integer types.
"LP32": char is 8 bits. short and int are 16 bits, long and pointer are 32-bits. Commonly used on 8 bit and 16 bit platforms.
"ILP32": char is 8 bits, short is 16 bits. int, long and pointer are all 32 bits. If long long exists it is 64 bit. Commonly used on 32 bit platforms.
"LLP64": char is 8 bits. short is 16 bits. int and long are 32 bits. long long and pointer are 64 bits. Used on 64 bit windows.
"LP64": char is 8 bits. short is 16 bits. int is 32 bits. long, long long and pointer are 64 bits. Used on most 64-bit unix-like systems.
"ILP64": char is 8 bits, short is 16 bits, int, long and pointer and long long are all 64 bits. Apparently used on some early 64-bit operating systems but rarely seen nowadays.
64 bit processors can typically run both 32-bit and 64-bit binaries. Generally this is handled by having a compatibility layer in your OS. So your 32-bit binary uses the same data types it would use when running on a 32-bit system, then the compatibility layer translates the system calls so that the 64-bit OS can handle them.
The compiler decides how large the basic types are, and what the layout of structures is. If a library declares any types, it will decide how those are defined and therefore what size they are.
However, it is often the case that compatibility with an existing standard, and the need to link to existing libraries produced by other compilers, forces a given implementation to make certain choices. For example, the language standard says that a wchar_t has to be wider than 16 bits, and on Linux, it is 32 bits wide, but it’s always been 16 bits on Windows, so compilers for Windows all choose to be compatible with the Windows API instead of the language standard. A lot of legacy code for both Linux and Windows assumes that a long is exactly 32 bits wide, while other code assumed it was wide enough to hold a timestamp in seconds or an IPv4 address or a file offset or the bits of a pointer, and (after one compiler defined int as 64 bits wide and long as 32 bits wide) the language standard made a new rule that int cannot be wider than long.
As a result, mainstream compilers from this century choose to define int as 32 bits wide, but historically some have defined it as 16 bits, 18 bits, 32 bits, 64 bits and other sizes. Some compilers let you choose whether long will be exactly 32 bits wide, as some legacy code assumes, or as wide as a pointer, as other legacy code assumes.
This demonstrates how assumptions you make today, like some type always being 32 bits wide, might come back to bite you in the future. This has already happened to C codebases twice, in the transitions to 32-bit and 64-bit code.
But what should you actually use?
The int type is rarely useful these days. There’s usually some other type you can use that makes a stronger guarantee of what you’ll get. (It does have one advantage: types that aren’t as wide as an int could get automatically widened to int, which could cause a few really weird bugs when you mix signed and unsigned types, and int is the smallest type guaranteed not to be shorter than int.)
If you’re using a particular API, you’ll generally want to use the same type it does. There are numerous types in the standard library for specific purposes, such as clock_t for clock ticks and time_t for time in seconds.
If you want the fastest type that’s at least 16 bits wide, that’s int_fast16_t, and there are other similar types. (Unless otherwise specified, all these types are defined in <stdint.h>.) If you want the smallest type that’s at least 32 bits wide, to pack the most data into your arrays, that’s int_least32_t. If you want the widest possible type, that’s intmax_t. If you know you want exactly 32 bits, and your compiler has a type like that, it’s int32_t If you want something that’s 32 bits wide on a 32-bit machine and 64 bits wide on a 64-bit machine, and always the right size to store a pointer, that’s intptr_t. If you want a good type for doing array indexing and pointer math, that’s ptrdiff_t from <stddef.h>. (This one’s in a different header because it’s from C89, not C99.)
Use the type you really mean!
When you talk about the compiler, you mush have a clean image about build|host|target, i.e, the machine you are building on (build), the machine that you are building for (host), and the machine that GCC will produce code for (target), because for "cross compiling" is very different from "native compiling".
About the question "who decide the sizeof datatype and structure", it depends on the target system you told compiler to build binary for. If target is 64 bits, the compiler will translate sizeof(long) to 8, and if the target is a 32 bits machine, the compiler will translate sizeof(long) to 4. All these have been predefined by header file you used to build your program. If you read your `$MAKETOP/usr/include/stdint.h', there are typedefs to define the size of your datatype.
To avoid the error created by the size difference, Google coding style-Integer_Types recommend using types like int16_t, uint32_t, int64_t, etc. Those were defined in <stdint.h>.
Above is only those `Plain Old Data', such as int. If you talk about a structure, there is another story, because the size of a structure depends on packing alignment, the boundaries alignment for each field in the structure, which will have impact on the size of the structure.
It's the compiler, and more precisely its code generator component.
Of course, the compiler is architecture-aware and makes choices that fit with it.
In some cases, the work is performed in two passes, one at compile-time by an intermediate code generators, then a second at run-time by a just-in-time compiler. But this is still a compiler.
I have a program that uses the following to write floats to a file, which will ultimately be read on a user's computer.
// computer A
float buffer[1024];
...
fwrite(reinterpret_cast<void*>(buffer), sizeof(float), 1024, file);
// computer B
float buffer[1024];
fread(reinterpret_cast<void*>(buffer), sizeof(float), 1024, file);
The programs on the two computers are not the same, but they are compiled with the same compiler and settings (I wouldn't expect this to work out otherwise). Will the floats be interpreted as expected across all typical desktop computers given both programs are compiled to target the platform, or is it possible the second computer will interpret the bytes differently?
The programs on the two computers are not the same, but they are
compiled with the same compiler and settings (I wouldn't expect this
to work out otherwise). Will the floats be interpreted as expected
across all typical desktop computers, or is it possible the second
computer will interpret the bytes differently?
Pretty much all modern desktop computers use IEEE 754 floating point format for their single-precision floating point numbers, so you should be okay.
One potential fly in the ointment is endian-ness: if you write out the file on a computer with a big-endian CPU and then read it on a computer that has a little-endian CPU (or vice-versa) then the reading computer will not interpret the file's values correctly. This is not a big problem in the last few years since almost all commonly used CPUs are little-endian these days, but previously that problem was commonly seen e.g. when transferring data from an Intel-based computer to a PowerPC-based computer, or vice-versa. A common way to handle the problem would be to specify a standard/canonical endian-ness (doesn't matter which one) for the values in your file, and be sure to byte-swap the values when saving (or loading) the file if the computer you are saving/loading them on doesn't match the canonical endian-ness specified by your file format.
This is probably a duplicate question...
It is definitely possible that the format can be interpreted differently.
However, you said "typical desktop computers", which generally means x86, x64, maybe ARM. But all of these use little-endian binary formats. So in practice you'd probably be OK.
Everything depends on compiler. But to avoid problems you can check some features of float for platform.
E.g. check std::numeric_limits<float>::digits, and if is 24, it means that compiler using the IEEE standard double-precision for float.
Would the size of an integer depend upon the compiler, OS and processor?
The answer to this question depends on how far from practical considerations we are willing to get.
Ultimately, in theory, everything in C and C++ depends on the compiler and only on the compiler. Hardware/OS is of no importance at all. The compiler is free to implement a hardware abstraction layer of any thickness and emulate absolutely anything. There's nothing to prevent a C or C++ implementation from implementing the int type of any size and with any representation, as long as it is large enough to meet the minimum requirements specified in the language standard. Practical examples of such level of abstraction are readily available, e.g. programming languages based on "virtual machine" platform, like Java.
However, C and C++ are intended to be highly efficient languages. In order to achieve maximum efficiency a C or C++ implementation has to take into account certain considerations derived from the underlying hardware. For that reason it makes a lot of sense to make sure that each basic type is based on some representation directly (or almost directly) supported by the hardware. In that sense, the size of basic types do depend on the hardware.
In other words, a specific C or C++ implementation for a 64-bit hardware/OS platform is absolutely free to implement int as a 71-bit 1's-complement signed integral type that occupies 128 bits of memory, using the other 57 bits as padding bits that are always required to store the birthdate of the compiler author's girlfriend. This implementation will even have certain practical value: it can be used to perform run-time tests of the portability of C/C++ programs. But that's where the practical usefulness of that implementation would end. Don't expect to see something like that in a "normal" C/C++ compiler.
Yes, it depends on both processors (more specifically, ISA, instruction set architecture, e.g., x86 and x86-64) and compilers including programming model. For example, in 16-bit machines, sizeof (int) was 2 bytes. 32-bit machines have 4 bytes for int. It has been considered int was the native size of a processor, i.e., the size of register. However, 32-bit computers were so popular, and huge number of software has been written for 32-bit programming model. So, it would be very confusing if 64-bit computer would have 8 bytes for int. Both Linux and Windows remain 4 bytes for int. But, they differ in the size of long.
Please take a look at the 64-bit programming model like LP64 for most *nix and LLP64 for Windows:
http://www.unix.org/version2/whatsnew/lp64_wp.html
http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
Such differences are actually quite embarrassing when you write code that should work both on Window and Linux. So, I'm always using int32_t or int64_t, rather than long, via stdint.h.
Yes, it would. Did they mean "which would it depend on: the compiler or the processor"? In that case the answer is basically "both." Normally, int won't be bigger than a processor register (unless that's smaller than 16 bits), but it could be smaller (e.g. a 32-bit compiler running on a 64-bit processor). Generally, however, you'll need a 64-bit processor to run code with a 64-bit int.
Based on some recent research I have done studying up for firmware interviews:
The most significant impact of the processors bit architecture ie, 8bit, 16bit, 32bit, 64bit is how you need to most efficiently store each byte of information in order to best compute variables in the minimum number of cycles.
The bit size of your processor tells you what the natural word length the CPU is capable of handling in one cycle. A 32bit machine needs 2 cycles to handle a 64bit double if it is aligned properly in memory. Most personal computers were and still are 32bit hence the most likely reason for the C compiler typical affinity for 32bit integers with options for larger floating point numbers and long long ints.
Clearly you can compute larger variable sizes so in that sense the CPU's bit architecture determines how it will have to store larger and smaller variables in order to achieve best possible efficiency of processing but it is in no way a limiting factor in the definitions of byte sizes for ints or chars, that is part of compilers and what is dictated by convention or standards.
I found this site very helpful, http://www.geeksforgeeks.org/archives/9705, for explaining how the CPU's natural word length effects how it will chose to store and handle larger and smaller variable types, especially with regards to bit packing into structs. You have to be very cognizant of how you chose to assign variables because larger variables need to be aligned in memory so they take the fewest number of cycles when divided by the CPU's word length. This will add a lot of potentially unnecessary buffer/empty space to things like structs if you poorly order the assignment of your variables.
The simple and correct answer is that it depends on the compiler. It doesn't mean architecture is irrelevant but the compiler deals with that, not your application. You could say more accurately it depends on the (target) architecture of the compiler for example if its 32 bits or 64 bits.
Consider you have windows application that creates a file where it writes an int plus other things and reads it back. What happens if you run this on both 32 bits and 64 bits windows? What happens if you copy the file created on 32 bits system and open it in 64 bits system?
You might think the size of int will be different in each file but no they will be the same and this is the crux of the question. You pick the settings in compiler to target for 32 bits or 64 bits architecture and that dictates everything.
http://www.agner.org/optimize/calling_conventions.pdf
"3 Data representation" contains good overview of what compilers do with integral types.
Data Types Size depends on Processor, because compiler wants to make CPU easier accessible the next byte. for eg: if processor is 32bit, compiler may not choose int size as 2 bytes[which it supposed to choose 4 bytes] because accessing another 2 bytes of that int(4bytes) will take additional CPU cycle which is waste. If compiler chooses int as 4 bytes CPU can access full 4 bytes in one shot which speeds your application.
Thanks
Size of the int is equal to the word-length that depends upon the underlying ISA. Processor is just the hardware implementation of the ISA and the compiler is just the software-side implementation of the ISA. Everything revolves around the underlying ISA. Most popular ISA is Intel's IA-32 these days. it has a word length of 32bits or 4bytes. 4 bytes could be the max size of 'int' (just plain int, not short or long) compilers. based on IA-32, could use.
size of data type basically depends upon the type of compiler and compilers are designed on the basis of architecture of processors so externally data type can be considered to be compiler dependent.for ex size of integer is 2 byte in 16 bit tc compiler but 4 byte in gcc compiler although they are executed in same processor
Yes , I found that size of int in turbo C was 2 bytes where as in MSVC compiler it was 4 bytes.
Basically the size of int is the size of the processor registers.
I'm confused with the byte order of a system/cpu/program.
So I must ask some questions to make my mind clear.
Question 1
If I only use type char in my C++ program:
void main()
{
char c = 'A';
char* s = "XYZ";
}
Then compile this program to a executable binary file called a.out.
Can a.out both run on little-endian and big-endian systems?
Question 2
If my Windows XP system is little-endian, can I install a big-endian Linux system in VMWare/VirtualBox?
What makes a system little-endian or big-endian?
Question 3
If I want to write a byte-order-independent C++ program, what do I need to take into account?
Can a.out both run on little-endian and big-endian system?
No, because pretty much any two CPUs that are so different as to have different endian-ness will not run the same instruction set. C++ isn't Java; you don't compile to something that gets compiled or interpreted. You compile to the assembly for a specific CPU. And endian-ness is part of the CPU.
But that's outside of endian issues. You can compile that program for different CPUs and those executables will work fine on their respective CPUs.
What makes a system little-endian or big-endian?
As far as C or C++ is concerned, the CPU. Different processing units in a computer can actually have different endians (the GPU could be big-endian while the CPU is little endian), but that's somewhat uncommon.
If I want to write a byte-order independent C++ program, what do I need to take into account?
As long as you play by the rules of C or C++, you don't have to care about endian issues.
Of course, you also won't be able to load files directly into POD structs. Or read a series of bytes, pretend it is a series of unsigned shorts, and then process it as a UTF-16-encoded string. All of those things step into the realm of implementation-defined behavior.
There's a difference between "undefined" and "implementation-defined" behavior. When the C and C++ spec say something is "undefined", it basically means all manner of brokenness can ensue. If you keep doing it, (and your program doesn't crash) you could get inconsistent results. When it says that something is defined by the implementation, you will get consistent results for that implementation.
If you compile for x86 in VC2010, what happens when you pretend a byte array is an unsigned short array (ie: unsigned char *byteArray = ...; unsigned short *usArray = (unsigned short*)byteArray) is defined by the implementation. When compiling for big-endian CPUs, you'll get a different answer than when compiling for little-endian CPUs.
In general, endian issues are things you can localize to input/output systems. Networking, file reading, etc. They should be taken care of in the extremities of your codebase.
Question 1:
Can a.out both run on little-endian and big-endian system?
No. Because a.out is already compiled for whatever architecture it is targeting. It will not run on another architecture that it is incompatible with.
However, the source code for that simple program has nothing that could possibly break on different endian machines.
So yes it (the source) will work properly. (well... aside from void main(), which you should be using int main() instead)
Question 2:
If my WindowsXP system is little-endian, can I install a big-endian
Linux system in VMWare/VirtualBox?
Endian-ness is determined by the hardware, not the OS. So whatever (native) VM you install on it, will be the same endian as the host. (since x86 is all little-endian)
What makes a system little-endian or big-endian?
Here's an example of something that will behave differently on little vs. big-endian:
uint64_t a = 0x0123456789abcdefull;
uint32_t b = *(uint32_t*)&a;
printf("b is %x",b)
*Note that this violates strict-aliasing, and is only for demonstration purposes.
Little Endian : b is 89abcdef
Big Endian : b is 1234567
On little-endian, the lower bits of a are stored at the lowest address. So when you access a as a 32-bit integer, you will read the lower 32 bits of it. On big-endian, you will read the upper 32 bits.
Question 3:
If I want to write a byte-order independent C++ program, what do I
need to take into account?
Just follow the standard C++ rules and don't do anything ugly like the example I've shown above. Avoid undefined behavior, avoid type-punning...
Little-endian / big-endian is a property of hardware. In general, binary code compiled for one hardware cannot run on another hardware, except in a virtualization environments that interpret machine code, and emulate the target hardware for it. There are bi-endian CPUs (e.g. ARM, IA-64) that feature a switch to change endianness.
As far as byte-order-independent programming goes, the only case when you really need to do it is to deal with networking. There are functions such as ntohl and htonl to help you converting your hardware's byte order to network's byte order.
The first thing to clarify is that endianness is a hardware attribute, not a software/OS attribute, so WinXP and Linux are not big-endian or little endian, but rather the hardware on which they run is either big-endian or little endian.
Endianness is a description of the order in which the bytes are stored in a data-type. A system that is big-endian stores the most significant (read biggest value) value first and a little-endian system stores the least significant byte first. It is not mandatory to have each datatype be the same as the others on a system so you can have mixed-endian systems.
A program that is little endian would not run on a big-endian system, but that has more to with the instruction set available than the endianness of the system on which it was compiled.
If you want to write a byte-order independent program you simply need to not depend on the byte order of your data.
1: The output of the compiler will depend on the options you give it and if you use a cross-compiler. By default, it should run on the operating system you are compiling it on and not others (perhaps not even others of the same type; not all Linux binaries run on all Linux installs, for example). In large projects, this will be the least of your concern, as libraries, etc, will need built and linked differently on each system. Using a proper build system (like make) will take care of most of this without you needing to worry.
2: Virtual machines abstract the hardware in such a way as to allow essentially anything to run within anything else. How the operating systems manage their memory is unimportant as long as they both run on the same hardware and support whatever virtualization model is in use. Endianness means the byte-order; if it is read left-right or right-left (or some other format). Some hardware supports both and virtualization allows both to coexist in that case (although I am not aware of how this would be useful except that it is possible in theory). However, Linux works on many different architectures (and Windows some other than Ixxx), so the situation is more complicated.
3: If you monkey with raw memory, such as with binary operators, you might put yourself in a position of depending on endianness. However, most modern programming is at a higher level than this. As such, you are likely to notice if you get into something which may impose endianness-based limitations. If such is ever required, you can always implement options for both endiannesses using the preprocessor.
The endianness of a system determine how the bytes are interpreted, so what bit is considered the "first" and what is considered the "last".
You need to care about it only when loading or saving from some sources external to your program, like disk or networks.
Would the size of an integer depend upon the compiler, OS and processor?
The answer to this question depends on how far from practical considerations we are willing to get.
Ultimately, in theory, everything in C and C++ depends on the compiler and only on the compiler. Hardware/OS is of no importance at all. The compiler is free to implement a hardware abstraction layer of any thickness and emulate absolutely anything. There's nothing to prevent a C or C++ implementation from implementing the int type of any size and with any representation, as long as it is large enough to meet the minimum requirements specified in the language standard. Practical examples of such level of abstraction are readily available, e.g. programming languages based on "virtual machine" platform, like Java.
However, C and C++ are intended to be highly efficient languages. In order to achieve maximum efficiency a C or C++ implementation has to take into account certain considerations derived from the underlying hardware. For that reason it makes a lot of sense to make sure that each basic type is based on some representation directly (or almost directly) supported by the hardware. In that sense, the size of basic types do depend on the hardware.
In other words, a specific C or C++ implementation for a 64-bit hardware/OS platform is absolutely free to implement int as a 71-bit 1's-complement signed integral type that occupies 128 bits of memory, using the other 57 bits as padding bits that are always required to store the birthdate of the compiler author's girlfriend. This implementation will even have certain practical value: it can be used to perform run-time tests of the portability of C/C++ programs. But that's where the practical usefulness of that implementation would end. Don't expect to see something like that in a "normal" C/C++ compiler.
Yes, it depends on both processors (more specifically, ISA, instruction set architecture, e.g., x86 and x86-64) and compilers including programming model. For example, in 16-bit machines, sizeof (int) was 2 bytes. 32-bit machines have 4 bytes for int. It has been considered int was the native size of a processor, i.e., the size of register. However, 32-bit computers were so popular, and huge number of software has been written for 32-bit programming model. So, it would be very confusing if 64-bit computer would have 8 bytes for int. Both Linux and Windows remain 4 bytes for int. But, they differ in the size of long.
Please take a look at the 64-bit programming model like LP64 for most *nix and LLP64 for Windows:
http://www.unix.org/version2/whatsnew/lp64_wp.html
http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
Such differences are actually quite embarrassing when you write code that should work both on Window and Linux. So, I'm always using int32_t or int64_t, rather than long, via stdint.h.
Yes, it would. Did they mean "which would it depend on: the compiler or the processor"? In that case the answer is basically "both." Normally, int won't be bigger than a processor register (unless that's smaller than 16 bits), but it could be smaller (e.g. a 32-bit compiler running on a 64-bit processor). Generally, however, you'll need a 64-bit processor to run code with a 64-bit int.
Based on some recent research I have done studying up for firmware interviews:
The most significant impact of the processors bit architecture ie, 8bit, 16bit, 32bit, 64bit is how you need to most efficiently store each byte of information in order to best compute variables in the minimum number of cycles.
The bit size of your processor tells you what the natural word length the CPU is capable of handling in one cycle. A 32bit machine needs 2 cycles to handle a 64bit double if it is aligned properly in memory. Most personal computers were and still are 32bit hence the most likely reason for the C compiler typical affinity for 32bit integers with options for larger floating point numbers and long long ints.
Clearly you can compute larger variable sizes so in that sense the CPU's bit architecture determines how it will have to store larger and smaller variables in order to achieve best possible efficiency of processing but it is in no way a limiting factor in the definitions of byte sizes for ints or chars, that is part of compilers and what is dictated by convention or standards.
I found this site very helpful, http://www.geeksforgeeks.org/archives/9705, for explaining how the CPU's natural word length effects how it will chose to store and handle larger and smaller variable types, especially with regards to bit packing into structs. You have to be very cognizant of how you chose to assign variables because larger variables need to be aligned in memory so they take the fewest number of cycles when divided by the CPU's word length. This will add a lot of potentially unnecessary buffer/empty space to things like structs if you poorly order the assignment of your variables.
The simple and correct answer is that it depends on the compiler. It doesn't mean architecture is irrelevant but the compiler deals with that, not your application. You could say more accurately it depends on the (target) architecture of the compiler for example if its 32 bits or 64 bits.
Consider you have windows application that creates a file where it writes an int plus other things and reads it back. What happens if you run this on both 32 bits and 64 bits windows? What happens if you copy the file created on 32 bits system and open it in 64 bits system?
You might think the size of int will be different in each file but no they will be the same and this is the crux of the question. You pick the settings in compiler to target for 32 bits or 64 bits architecture and that dictates everything.
http://www.agner.org/optimize/calling_conventions.pdf
"3 Data representation" contains good overview of what compilers do with integral types.
Data Types Size depends on Processor, because compiler wants to make CPU easier accessible the next byte. for eg: if processor is 32bit, compiler may not choose int size as 2 bytes[which it supposed to choose 4 bytes] because accessing another 2 bytes of that int(4bytes) will take additional CPU cycle which is waste. If compiler chooses int as 4 bytes CPU can access full 4 bytes in one shot which speeds your application.
Thanks
Size of the int is equal to the word-length that depends upon the underlying ISA. Processor is just the hardware implementation of the ISA and the compiler is just the software-side implementation of the ISA. Everything revolves around the underlying ISA. Most popular ISA is Intel's IA-32 these days. it has a word length of 32bits or 4bytes. 4 bytes could be the max size of 'int' (just plain int, not short or long) compilers. based on IA-32, could use.
size of data type basically depends upon the type of compiler and compilers are designed on the basis of architecture of processors so externally data type can be considered to be compiler dependent.for ex size of integer is 2 byte in 16 bit tc compiler but 4 byte in gcc compiler although they are executed in same processor
Yes , I found that size of int in turbo C was 2 bytes where as in MSVC compiler it was 4 bytes.
Basically the size of int is the size of the processor registers.