I have learnt that while printing the address of a variable in c, we use unsigned int (%u in format string).
Its range is from 0 to 65535. Now an integer takes 4 bytes of memory, which means a maximum of 16384 (65536/4) integers can be stored. What will happen if we try to declare an array int a[20000] and get addresses of each of its element?
#include<stdio.h>
int main(void)
{
int a[20000];
for(i=0; i<19999; i++])
printf("%u", &a[i]);
}
In early times of C, a pointer and a int where similar types, and you could safely cast from one to the other and back again. In that early time, pointers and int were both 16 bits long. It is still true on 32 bits systems where int is now 32 bits. But it is false on 64 bits systems, because pointers are 64 bits long and int only 32 bits.
So I do not know where and how you learnt that while printing the address of a variable in c , we use unsigned int(%u), but forget it ASAP because in the general case it is wrong. The only foolproof way to print an adress is %p because system automatically adapt the size to 16, 32 or 64 bits.
And no longer convert pointers to int or opposite side, because it is highly non portable. From that other post, A more portable way (on the C99 standard variant) is to #include <stdint.h> and then cast pointers to intptr_t (and back). This integer type is guaranteed to be the size of a pointer.
And, I almost forgot 65536 = 0x10000 = 216, and 232 = 4294967296. So you were not that far from reality, and it is true that in older 16 bits system you could not have int array[40000] because int were 16 bits long and the array would exhaust all the available memory.
But on 32 bits systems, you can address 4 Gb of memory, so int array[20000] is harmless.
A pointer is a memory address where you can find some data. We can find the size the of a pointer variable using sizeof( ) operator. So its size doesn't depend on what it points at .It however depends on many bytes a memory address takes up on your system, which is 4 for a 32 bit compilers and 8 for 64-bit compiler.
If we have declared a pointer, double j, type of j is double, i.e. “a pointer to double".
%p is the correct format specifier for printing a pointer. %p outputs addresses in the hexadecimal notation.
Sometimes people use %u and %x (unsigned int in hexadecimal form) specifiers to print a pointer variable. It is however an undefined behavior to pass a pointer for a %x or %u argument.
However it works with 32 bit compilers like code blocks .This is because the size of unsigned int and the pointer is same here. (Both 4 bytes)
(It is false to assume that int and pointers have the same width . For both GCC 64 bit and MSVC 64 bit running on x64, sizeof(void *) == 8, while sizeof(unsigned int) == 4. It just so happens that on some architectures pointers and ints are the same size, e.g. the PDP-11/34a, or most 32 bit architectures nowadays. But it is extremely unwise to ever write code that relies on it.
You can do add extra 2 lines as below and verify:
printf("size of unsigned int is %lu\n", sizeof(unsigned int));
printf("size of pointer is %lu\n", sizeof(int *));
On a 64-bit GCC machine with a 64-bit operating system, this should give you 4 and 8 respectively )
On a GCC 64-bit machine-%x casts your pointer to an unsigned integer (32-bit length). The size of pointer is of 8-byte (64 bit) length. Printing with %p prints the whole pointer, in its complete size – 64 bits. But when you are printing with %x, only the lower 32 bits are printed. Hence it is always safe to print a pointer with %p.
Actually unsigned integers are having range of 0 to 65536 on older 16bit compilers like turbo c. Now a days all systems are having 32 or 64 bit architecture on which unsigned integers range is 0 to 4G (giga). So this code should work fine in latest compilers like gcc (under linux) or visual studio (under windows). Try switching to these compilers. They are very good and are widely used now a days. 16bit compilers are obsolete. Avoid using such compilers. If you are using windows then code blocks or dev c++ are some good programming IDEs for learning c.
P.S. avoid using %u for printing addresses. Use %p instead.
Related
I'm a bit confused about a sizeof result.
I have this :
unsigned long part1 = 0x0200000001;
cout << sizeof(part1); // this gives 8 byte
if I count correctly, part 1 is 9 byte long right ?
Can anybody clarify this for me
thanks
Regards
if I count correctly, part 1 is 9 byte long right ?
No, you are counting incorrectly. 0x0200000001 can fit into five bytes. One byte is represented by two hex digits. Hence, the bytes are 02 00 00 00 01.
I suppose you misinterpret the meaning of sizeof. sizeof(type) returns the number of bytes that the system reserves to hold any value of the respective type. So sizeof(int) on an 32 bit system will probably give you 4 bytes, and 8 bytes on a 64 bit system, sizeof(char[20]) gives 20, and so on.
Note that one can use also identifiers of (typed) variables, e.g. int x; sizeof(x); type is then deduced from the variable declaration/definition, such that sizeof(x) is the same as sizeof(int) in this case.
But: sizeof never ever interprets or analyses the content / the value of a variable at runtime, even if sizeof somehow sounds as if. So char *x = "Hello, world"; sizeof(x) is not the string length of string literal "Hello, world" but the size of type char*.
So your sizeof(part1) is the same as sizeof(long), which is 8 on your system regardless of what the actual content of part1 is at runtime.
An unsigned long has a minimum range of 0 to 4294967295 that's 4 bytes.
Assigning 0x0200000001 (8589934593) to an unsigned long that's not big enough will trigger a conversion so that it fits in an unsigned long on your machine. This conversion is implementation-defined but usually the higher bits will be discarded.
sizeof will tell you the amount of bytes a type uses. It won't tell you how many bytes are occupied by your value.
sizeof(part1) (I'm assuming you have a typo) gives the size of a unsigned long (i.e. sizeof(unsigned long)). The size of part1 is therefore the same regardless of what value is stored in it.
On your compiler, sizeof(unsigned long) has value of 8. The size of types is implementation defined for all types (other than char types that are defined to have a size of 1), so may vary between compilers.
The value of 9 you are expecting is the size of output you will obtain by writing the value of part1 to a file or string as human-readable hex, with no leading zeros or prefix. That has no relationship to the sizeof operator whatsoever. And, when outputting a value, it is possible to format it in different ways (e.g. hex versus decimal versus octal, leading zeros or not) which affect the size of output
sizeof(part1) returns the size of the data type of variable part1, which you have defined as unsigned long.
The bit-size of unsigned long for your compiler is always 64 bits, or 8 bytes long, that's 8 groups of 8 bits. The hexadecimal representation is a human readable form of the binary format, where each digit is 4 bits long. We humans often omit leading zeroes for clarity, computers never do.
Let's consider a single byte of data - a char - and the value zero.
- decimal: 0
- hexadecimal : 0x0 (often written as 0x00)
- binary: 0000 0000
For a list of C++ data types and their corresponding bit-size check out the documentation at (the msvc documentation is easier to read):
for msvc: https://msdn.microsoft.com/en-us/library/s3f49ktz(v=vs.71).aspx
for gcc: https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Data-Types
All compilers have documentation for their data sizes, since they depend on the hardware and the compiler itself. If you use a different compiler, a google search on "'your compiler name' data sizes" will help you find the correct sizes for your compiler.
Here i get 4225440 as the address of arr[0]; as it an integer array, the address will be increased by 4, so next one will be 4225444;
now
whats happen with those addresses
if put manualy one of addresses it shows absurd value from where it comes.
This is the code under discussion
#include <stdio.h>
int arr[10],i,a,*j;
void del(int a);
main()
{
for(i=0;i<4;i++)
scanf("%d",&arr[i]);
j=(int*)4225443;
for(i=0;i<4;i++)
{
printf("\n%d ",arr[i]);
printf(" %d ",&arr[i]);
}
printf(" %d ",*j);
}
j=(int*)4225443;
/* ... */
printf(" %d ",*j);
C has its word to say:
(C11, 6.3.2.3p5) "An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation."
In your case you can add to that you are also violating aliasing rules.
most of the CPUs that we use today have either
a 32bit or 64 bit wide bus between the CPU and the memory.
Lets use the 32 bit wide bus for demonstration purposes..
in general, each memory access will be to read (or write) 32 bits,
where the first address of that 32 bits will be an address
that is evenly divisible by 32.
In such a architecture, a 'int' will start on a address
that is evenly divisible by 32 and be 4 bytes (32bits) long.
in general, when the address of 'something' is NOT
on a 32 bit address boundary
(I.E. the address is not evenly divisible by 32)
then the CPU will:
for read,
read the whole 32 bits from memory,
starting at the 32 bit boundary,
then, within the CPU,
using the registers and the logic and math operations,
extract the desired byte.
for write,
read the whole 32 bits from memory,
starting at the 32 bit boundary,
then, within the CPU,
using the registers and logic and math operations,
modify the desired byte,
then write the whole 32 bits to memory
In other words,
accessing memory other than on 32bit boundarys is SLOW.
Unfortunately some CPUs,
if requested to read/write some value to/from memory
at other than a 32 bit boundary will raise a bus error.
regarding the 'unbelievable' value of the int
when the second byte of the int is modified...
A int (lets use a little endian architecture) is 4 bytes,
aligned on a 32 bit boundary
(I.E. the lowest address of the int is on a 32 bit boundary.)
Lets, for example say the int contains '5'
then its' representation in memory is 0x00,0x00,0x00,0x05
Then the second byte (address of the int+1) is set to some value,
for example, say 3,
Then the int contains 0x000, 0x03, 0x00, 0x05
now, when that int is printed, it will display: 196613
Note: the order of the bytes in memory is somewhat different
for a big endian architecture.
It will print value, located in address 4225443, if value exists, otherwise it will produce memory violation exception.
I was just wondering how can I know if my laptop is 64 or 32 bit machine. (it is a 64).
So, I thought about printing the following:
int main()
{
printf("%d",sizeof(int));
}
and the result was 4, which seemed weird (since it is a 64 bit machine)
But, when I printed this:
int main()
{
printf("%d",sizeof(int*));
}
the result was 8, which made more sense.
The question is:
Since I'm using a 64 bit machine, shouldn't a primitive type such as int should use 8 bytes
(64 bit) and by that sizeof int should be 8? Why isn't it so?
And why is the int* size is 8?
A bit confused here,
so thanks in advance.
No, the sizeof(int) is implementation defined, and is usually 4 bytes.
On the other hand, in order to address more than 4GB of memory (that 32bit systems can do), you need your pointers to be 8 bytes wide. int* just holds the address to "somewhere in memory", and you can't address more than 4GB of memory with just 32 bits.
Size of a pointer should be 8 byte on any 64-bit C/C++ compiler, but the same is not true for the size of int.
Wikipedia has a good explanation on that:
In many programming environments for C and C-derived languages on
64-bit machines, "int" variables are still 32 bits wide, but long
integers and pointers are 64 bits wide. These are described as having
an LP64 data model. Another alternative is the ILP64 data model in
which all three data types are 64 bits wide, and even SILP64 where
"short" integers are also 64 bits wide.[citation needed] However, in
most cases the modifications required are relatively minor and
straightforward, and many well-written programs can simply be
recompiled for the new environment without changes. Another
alternative is the LLP64 model, which maintains compatibility with
32-bit code by leaving both int and long as 32-bit. "LL" refers to the
"long long integer" type, which is at least 64 bits on all platforms,
including 32-bit environments.
The sizeof(int), sizeof(int*), and "machine size", though often correlated to each other, can each be independently smaller, the same or larger than the others. About the only C requirement is that they be at least 16 bits (or so) - other than that, it compiler dependent for the sizeof(int), sizeof(int*).
(Although maybe a pointer must be at least an int size. Hmmm)
Programmers like to have integer types of 1, 2, 4 and 8 bytes or 8, 16, 32 and 64 bits. There are only two integer types that could be smaller than int: char and short. If int was 64 bits, then you couldn't have all three of the sizes 8, 16 and 32 bits. That's why compilers tend to make int = 32 bits, so you can have char = 8 bit, short = 16 bit, int = 32 bit, long long = 64 bit and long = 32 bit or 64 bit.
Because of size_t was define as
typedef unsigned int size_t;
You should display it with %zu, %u or %lu instead of %d.
printf("%zu\n", sizet);
printf("%u\n", sizet);
printf("%lu\n", sizet);
I was wondering how to reliably determine the size of a character in a portable way. AFAIK sizeof(char) can not be used because this yields always 1, even on system where the byte has 16 bit or even more or less.
For example when dealing with bits, where you need to know exactly how big it is, I was wondering if this code would give the real size of a character, independent on what the compiler thinks of it. IMO the pointer has to be increased by the compiler to the correct size, so we should have the correct value. Am I right on this, or might there be some hidden problem with pointer arithmetics, that would yield also wrong results on some systems?
int sizeOfChar()
{
char *p = 0;
p++;
int size_of_char = (int)p;
return size_of_char;
}
There's a CHAR_BIT macro defined in <limits.h> that evaluates to exactly what its name suggests.
IMO the pointer has to be increased by the compiler to the correct size, so we should have the correct value
No, because pointer arithmetic is defined in terms of sizeof(T) (the pointer target type), and the sizeof operator yields the size in bytes. char is always exactly one byte long, so your code will always yield the NULL pointer plus one (which may not be the numerical value 1, since NULL is not required to be 0).
I think it's not clear what you consider to be "right" (or "reliable", as in the title).
Do you consider "a byte is 8 bits" to be the right answer? If so, for a platform where CHAR_BIT is 16, then you would of course get your answer by just computing:
const int octets_per_char = CHAR_BIT / 8;
No need to do pointer trickery. Also, the trickery is tricky:
On an architecture with 16 bits as the smallest addressable piece of memory, there would be 16 bits at address 0x00001, another 16 bits at address 0x0001, and so on.
So, your example would compute the result 1, since the pointer would likely be incremented from 0x0000 to 0x0001, but that doesn't seem to be what you expect it to compute.
1 I use a 16-bit address space for brevity, it makes the addresses easier to read.
The size of one char (aka byte ) in bits is determined by the macro CHAR_BIT in <limits.h> (or <climits> in C++).
The sizeof operator always returns the size of a type in bytes, not in bits.
So if on some system CHAR_BIT is 16 and sizeof(int) is 4, that means an int has 64 bits on that system.
(sizeof) char always returns 1 in 32 bit GCC compiler.
But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???
Considering the following :
struct st
{
int a;
char c;
};
sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)
I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.
Can someone pls explain this???
I would be very thankful for any replies explaining it!!!
EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made.
Thanks to all those who made it a point that It must be changed especially #Mike Burton for downvoting the question and to #jalf who seemed to jump to conclusions over my understanding of concepts!!
sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.
The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.
char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.
There is a wikipedia article called Data structure alignment which has a good explanation and examples.
It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here
Sample code demonstrating structure alignment:
struct st
{
int a;
char c;
};
struct stb
{
int a;
char c;
char d;
char e;
char f;
};
struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};
std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb) << std::endl; //8
std::cout<<sizeof(stc) << std::endl; //12
The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.
First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.
Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).
Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).
All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.
On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.
A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.
An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.
In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.
The CPU can still address individual bytes, which is useful when dealing with chars, for example.
As for your example:
struct st
{
int a;
char c;
};
sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.
A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.
So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.
In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.
To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.
Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.
If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.
Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.
This is why sizeof(char) is 1.
ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.
Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.
It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.
Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.
Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.