Pointer Conception - c++

Here i get 4225440 as the address of arr[0]; as it an integer array, the address will be increased by 4, so next one will be 4225444;
now
whats happen with those addresses
if put manualy one of addresses it shows absurd value from where it comes.
This is the code under discussion
#include <stdio.h>
int arr[10],i,a,*j;
void del(int a);
main()
{
for(i=0;i<4;i++)
scanf("%d",&arr[i]);
j=(int*)4225443;
for(i=0;i<4;i++)
{
printf("\n%d ",arr[i]);
printf(" %d ",&arr[i]);
}
printf(" %d ",*j);
}

j=(int*)4225443;
/* ... */
printf(" %d ",*j);
C has its word to say:
(C11, 6.3.2.3p5) "An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation."
In your case you can add to that you are also violating aliasing rules.

most of the CPUs that we use today have either
a 32bit or 64 bit wide bus between the CPU and the memory.
Lets use the 32 bit wide bus for demonstration purposes..
in general, each memory access will be to read (or write) 32 bits,
where the first address of that 32 bits will be an address
that is evenly divisible by 32.
In such a architecture, a 'int' will start on a address
that is evenly divisible by 32 and be 4 bytes (32bits) long.
in general, when the address of 'something' is NOT
on a 32 bit address boundary
(I.E. the address is not evenly divisible by 32)
then the CPU will:
for read,
read the whole 32 bits from memory,
starting at the 32 bit boundary,
then, within the CPU,
using the registers and the logic and math operations,
extract the desired byte.
for write,
read the whole 32 bits from memory,
starting at the 32 bit boundary,
then, within the CPU,
using the registers and logic and math operations,
modify the desired byte,
then write the whole 32 bits to memory
In other words,
accessing memory other than on 32bit boundarys is SLOW.
Unfortunately some CPUs,
if requested to read/write some value to/from memory
at other than a 32 bit boundary will raise a bus error.
regarding the 'unbelievable' value of the int
when the second byte of the int is modified...
A int (lets use a little endian architecture) is 4 bytes,
aligned on a 32 bit boundary
(I.E. the lowest address of the int is on a 32 bit boundary.)
Lets, for example say the int contains '5'
then its' representation in memory is 0x00,0x00,0x00,0x05
Then the second byte (address of the int+1) is set to some value,
for example, say 3,
Then the int contains 0x000, 0x03, 0x00, 0x05
now, when that int is printed, it will display: 196613
Note: the order of the bytes in memory is somewhat different
for a big endian architecture.

It will print value, located in address 4225443, if value exists, otherwise it will produce memory violation exception.

Related

How to copy part of int64_t to char[4] in c++?

I have a variable:
int64_t label : 40
I want to take the 32 lower bits and put them in a variable of type:
char nol[4]
How can I do that in c++?
Depends on what you mean by "lower" bits. The word "lower" normally implies lower memory address. But that's rarely useful. You may be thinking of least significant instead, which is more commonly useful.
You must also consider what order you want the bytes to be in the array. When copying the lower bytes, you typically want to keep the bytes in the same order as in the integer i.e. native endianness. When copying least significant bytes, you typically want a specific order which may differ from the native endianness i.e. either big or little endian. Big endian is conventionally used in network communication.
If the number of bits to copy is not a multiple of byte size, then copying the incomplete byte adds some complexity.
Copying the lower bytes in native order is very simple:
char nol[32 / CHAR_BIT];
std::memcpy(nol, &label, sizeof nol);
Here is an example of copying least significant bytes in big endian order:
for (int i = 0; i < sizeof nol; i++) {
nol[sizeof nol - i] = label >> CHAR_BIT * i & UCHAR_MAX;
}

Tricky interview question for mid-level C++ developer

I was asked this question on the interview, and I can't really understand what is going on here. The question is "What would be displayed in the console?"
#include <iostream>
int main()
{
unsigned long long n = 0;
((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;
n >>= 7*8;
std::cout << n;
}
What is happening here, step by step?
Let's get this one step at a time:
((char*)&n)
This casts the address of the variable n from unsigned long long* to char*. This is legal and actually accessing objects of different types via pointer of char is one of the very few "type punning" cases accepted by the language. This in effect allows you to access the memory of the object n as an array of bytes (aka char in C++)
((char*)&n)[sizeof(unsigned long long)-1]
You access the last byte of the object n. Remember sizeof returns the dimension of a data type in bytes (in C++ char has an alter ego of byte)
((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;
You set the last byte of n to the value 0xFF.
Since n was 0 initially the layout memory of n is now:
00 .. 00 FF
Now notice the ... I put in the middle. That's not because I am lazy to copy paste the values the amount of bytes n has, it's because the size of unsigned long long is not set by the standard to a fixed dimension. There are some restrictions, but it can vary from implementation to implementation. So this is the first "unknown". However on most modern architectures sizeof (unsigned long long) is 8, so we are going to go with this, but in a serious interview you are expected to mention this.
The other "unknown" is how these bytes are interpreted. Unsigned integers are simply encoded in binary. But it can be little endian or big endian. x86 is little endian so we are going with it for the exemplification. And again, in a serious interview you are expected to mention this.
n >>= 7*8;
This right shifts the value of n 56 times. Pay attention, now we are talking about the value of n, not the bytes in memory. With our assumptions (size 8, little endian) the value encoded in memory is 0xFF000000 00000000 so shifting it 7*8 times will result in the value 0xFF which is 255.
So, assuming sizeof(unsigned long long) is 8 and a little endian encoding the program prints 255 to the console.
If we are talking about a big endian system, the memory layout after setting the last byte to 0xff is still the same: 00 ... 00 FF, but now the value encoded is 0xFF. So the result of n >>= 7*8; would be 0. In a big endian system the program would print 0 to the console.
As pointed out in the comments, there are other assumptions:
char being 8 bits. Although sizeof(char) is guaranteed to be 1, it doesn't have to have 8 bits. All modern systems I know of have bits grouped in 8-bit bytes.
integers don't have to be little or big endian. There can be other arrangement patterns like middle endian. Being something other than little or big endian is considered esoteric nowadays.
Cast the address of n to a pointer to chars, set the 7th (assuming sizeof(long long)==8) char element to 0xff, then right-shift the result (as a long long) by 56 bits.

Writing bits to file?

I'm trying to implement a Huffman tree.
Content of my simple .txt file that I want to do a simple test:
aaaaabbbbccd
Frequencies of characters: a:5, b:4, c:2, d:1
Code Table: (Data type of 1s and 0s: string)
a:0
d:100
c:101
b:11
Result that I want to write as binary: (22 bits)
0000011111111101101100
How can I write bit-by-bit each character of this result as a binary to ".dat" file? (not as string)
Answer: You can't.
The minimum amount you can write to a file (or read from it), is a char or unsigned char. For all practical purposes, a char has exactly eight bits.
You are going to need to have a one char buffer, and a count of the number of bits it holds. When that number reaches 8, you need to write it out, and reset the count to 0. You will also need a way to flush the buffer at the end. (Not that you cannot write 22 bits to a file - you can only write 16 or 24. You will need some way to mark which bits at the end are unused.)
Something like:
struct BitBuffer {
FILE* file; // Initialization skipped.
unsigned char buffer = 0;
unsigned count = 0;
void outputBit(unsigned char bit) {
buffer <<= 1; // Make room for next bit.
if (bit) buffer |= 1; // Set if necessary.
count++; // Remember we have added a bit.
if (count == 8) {
fwrite(&buffer, sizeof(buffer), 1, file); // Error handling elided.
buffer = 0;
count = 0;
}
}
};
The OP asked:
How can I write bit-by-bit each character of this result as a binary to ".dat" file? (not as string)
You can not and here is why...
Memory model
Defines the semantics of a computer memory storage for the purpose of C++ abstract machine.
The memory available to a C++ program is one or more contiguous sequences of bytes. Each byte in memory has a unique address.
Byte
A byte is the smallest addressable unit of memory. It is defined as a contiguous sequence of bits, large enough to hold the value of any UTF-8 code unit (256 distinct values) and of (since C++14) any member of the basic execution character set (the 96 characters that are required to be single-byte). Similar to C, C++ supports bytes of sizes 8 bits and greater.
The types char, unsigned char, and signed char use one byte for both storage and value representation. The number of bits in a byte is accessible as CHAR_BIT or std::numeric_limits<unsigned char>::digits.
Compliments of cppreference.com
You can find this page here: cppreference:memory model
This comes from the 2017-03-21: standard
©ISO/IEC N4659
4.4 The C++ memory model [intro.memory]
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (5.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits,4 the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
[ Note: The representation of types is described in 6.9. —end note ]
A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having nonzero width. [ Note: Various features of the language, such as references and virtual functions, might involve additional memory locations that are not accessible to programs but are managed by the implementation. —end note ] Two or more threads of execution (4.7) can access separate memory locations without interfering
with each other.
[ Note: Thus a bit-field and an adjacent non-bit-field are in separate memory locations, and therefore can be concurrently updated by two threads of execution without interference. The same applies to two bit-fields, if one is declared inside a nested struct declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field declaration. It is not safe to concurrently update two bit-fields in the same struct if all fields between them are also bit-fields of nonzero width. —end note ]
[ Example: A structure declared as
struct {
char a;
int b:5,
c:11,
:0,
d:8;
struct {int ee:8;} e;
}
contains four separate memory locations: The field a and bit-fields d and e.ee are each separate memory
locations, and can be modified concurrently without interfering with each other. The bit-fields b and c
together constitute the fourth memory location. The bit-fields b and c cannot be concurrently modified, but
b and a, for example, can be. —end example ]
4) The number of bits in a byte is reported by the macro CHAR_BIT in the header <climits>.
This version of the standard can be found here:
www.open-std.org section § 4.4 on pages 8 & 9.
The smallest possible memory module that can be written to in a program is 8 contiguous bits or more for a standard byte. Even with bit fields, the 1 byte requirement still holds. You can manipulate, toggle, set, individual bits within a byte but you can not write individual bits.
What can be done is to have a byte buffer with a count of bits written. When your required bits are written you will need to have the rest of the unused bits marked as padding or un-used buffer bits.
Edit
[Note:] -- When using bit fields or unions one thing that you must take into consideration is the endian of the specific architecture.
Answer: You can, in a way.
Hello, from my experience I have found a way to do that simple. For the task you need to define yourself and array of characters (it just needs to be for instance 1 byte, it can be bigger). After that you must define functions to access a specific bit from any element. For example, how to write an expression to get the value of the 3th bit from a char in C++.
*/*position is [1,..,n], and bytes
are in little endian and index from 0`enter code here`*/
int bit_at(int position, unsigned char byte)
{
return (byte & (1 << (position - 1)));
}*
Now you can vision the array of bytes as this
[b1,...,bn]
Now what we actually have in memory is 8 * n bits of memory
We can try to visualize it like so.
NOTE: the arrays is zeroed!
|0000 0000|0000 0000|...|0000 0000|
Now from this you or whoever wants can figure how to manipulate it to get a specific bit from this array. Of course there will be some sort of converted but that is not such a problem.
In the end, for the encoding you provide, that is:
a:0
d:100
c:101
b:11
We can encode the message "abcd",
and make an array that holds the bits
of the message, using the elements
of the array as arrays for bits, like so:
|0111 0110|0000 0000|
You can write this to memory and you will have an excess of at most 7 bits.
This is a simple example, but it can be extended into much more.
I hope this gave some answers to your question.

printing the address of a pointer variable

I have learnt that while printing the address of a variable in c, we use unsigned int (%u in format string).
Its range is from 0 to 65535. Now an integer takes 4 bytes of memory, which means a maximum of 16384 (65536/4) integers can be stored. What will happen if we try to declare an array int a[20000] and get addresses of each of its element?
#include<stdio.h>
int main(void)
{
int a[20000];
for(i=0; i<19999; i++])
printf("%u", &a[i]);
}
In early times of C, a pointer and a int where similar types, and you could safely cast from one to the other and back again. In that early time, pointers and int were both 16 bits long. It is still true on 32 bits systems where int is now 32 bits. But it is false on 64 bits systems, because pointers are 64 bits long and int only 32 bits.
So I do not know where and how you learnt that while printing the address of a variable in c , we use unsigned int(%u), but forget it ASAP because in the general case it is wrong. The only foolproof way to print an adress is %p because system automatically adapt the size to 16, 32 or 64 bits.
And no longer convert pointers to int or opposite side, because it is highly non portable. From that other post, A more portable way (on the C99 standard variant) is to #include <stdint.h> and then cast pointers to intptr_t (and back). This integer type is guaranteed to be the size of a pointer.
And, I almost forgot 65536 = 0x10000 = 216, and 232 = 4294967296. So you were not that far from reality, and it is true that in older 16 bits system you could not have int array[40000] because int were 16 bits long and the array would exhaust all the available memory.
But on 32 bits systems, you can address 4 Gb of memory, so int array[20000] is harmless.
A pointer is a memory address where you can find some data. We can find the size the of a pointer variable using sizeof( ) operator. So its size doesn't depend on what it points at .It however depends on many bytes a memory address takes up on your system, which is 4 for a 32 bit compilers and 8 for 64-bit compiler.
If we have declared a pointer, double j, type of j is double, i.e. “a pointer to double".
%p is the correct format specifier for printing a pointer. %p outputs addresses in the hexadecimal notation.
Sometimes people use %u and %x (unsigned int in hexadecimal form) specifiers to print a pointer variable. It is however an undefined behavior to pass a pointer for a %x or %u argument.
However it works with 32 bit compilers like code blocks .This is because the size of unsigned int and the pointer is same here. (Both 4 bytes)
(It is false to assume that int and pointers have the same width . For both GCC 64 bit and MSVC 64 bit running on x64, sizeof(void *) == 8, while sizeof(unsigned int) == 4. It just so happens that on some architectures pointers and ints are the same size, e.g. the PDP-11/34a, or most 32 bit architectures nowadays. But it is extremely unwise to ever write code that relies on it.
You can do add extra 2 lines as below and verify:
printf("size of unsigned int is %lu\n", sizeof(unsigned int));
printf("size of pointer is %lu\n", sizeof(int *));
On a 64-bit GCC machine with a 64-bit operating system, this should give you 4 and 8 respectively )
On a GCC 64-bit machine-%x casts your pointer to an unsigned integer (32-bit length). The size of pointer is of 8-byte (64 bit) length. Printing with %p prints the whole pointer, in its complete size – 64 bits. But when you are printing with %x, only the lower 32 bits are printed. Hence it is always safe to print a pointer with %p.
Actually unsigned integers are having range of 0 to 65536 on older 16bit compilers like turbo c. Now a days all systems are having 32 or 64 bit architecture on which unsigned integers range is 0 to 4G (giga). So this code should work fine in latest compilers like gcc (under linux) or visual studio (under windows). Try switching to these compilers. They are very good and are widely used now a days. 16bit compilers are obsolete. Avoid using such compilers. If you are using windows then code blocks or dev c++ are some good programming IDEs for learning c.
P.S. avoid using %u for printing addresses. Use %p instead.

Understanding sizeof(char) in 32 bit C compilers

(sizeof) char always returns 1 in 32 bit GCC compiler.
But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???
Considering the following :
struct st
{
int a;
char c;
};
sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)
I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.
Can someone pls explain this???
I would be very thankful for any replies explaining it!!!
EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made.
Thanks to all those who made it a point that It must be changed especially #Mike Burton for downvoting the question and to #jalf who seemed to jump to conclusions over my understanding of concepts!!
sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.
The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.
char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.
There is a wikipedia article called Data structure alignment which has a good explanation and examples.
It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here
Sample code demonstrating structure alignment:
struct st
{
int a;
char c;
};
struct stb
{
int a;
char c;
char d;
char e;
char f;
};
struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};
std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb) << std::endl; //8
std::cout<<sizeof(stc) << std::endl; //12
The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.
First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.
Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).
Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).
All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.
On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.
A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.
An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.
In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.
The CPU can still address individual bytes, which is useful when dealing with chars, for example.
As for your example:
struct st
{
int a;
char c;
};
sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.
A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.
So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.
In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.
To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.
Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.
If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.
Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.
This is why sizeof(char) is 1.
ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.
Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.
It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.
Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.
Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.