Difference between various ways of using memset function - c++

What is the difference between the following three commands?
Suppose we declare an array arr having 10 elements.
int arr[10];
Now the commands are:
Command 1:
memset(arr,0,sizeof(arr));
and
Command 2:
memset(arr,0,10*sizeof(int));
These two commands are running smoothly in an program but the following command is not
Command 3:
memset(arr,0,10);
So what is the difference between the 3 commands?

Case #1: sizeof(arr) returns 10 * sizeof(int)
Case #2: sizeof(int) * 10 returns the same thing
Case #3: 10 returns 10
An int takes up more than one byte (usually 4 on 32 bit). So if you did 40 for case three, it would probably work. But never actually do that.

memset's 3rd paranneter is how many bytes to fill. So here you're telling memset to set 10 bytes:
memset(arr,0,10);
But arr isn't necesarrily 10 bytes. (In fact, it's not) You need to know how many bytes are in arr, not how many elements.
The size of an int is not guaranteed to be 1 byte. On most modern PC-type hardware, it's going to be 4 bytes.
You should not assume the size of any particular datatype, except char which is guaranteed to be 1 byte exactly. For everything else, you must determine the size of it (at compile time) by using sizeof.

memset(arr,0,sizeof(arr)) fills arr with sizeof(arr) zeros -- as bytes. sizeof(arr) is correct in this case, but beware using this approach on pointers rather than arrays.
memset(arr,0,10*sizeof(int)) fills arr with 10*sizeof(int) zeros, again as bytes. This is once again the correct size in this case. This is more fragile than the first. What if arr does not contain 10 elements, and what if the type of each element is not int. For example, you find you are getting overflow and change int arr[10] to long long arr[10].
memset(arr,0,10) fills the first 10 bytes of arr with zeros. This clearly is not what you want.
None of these is very C++-like. It would be much better to use std::fill, which you get from the <algorithm> header file. For example, std::fill (a, a+10, 0).

Related

How do this code work without any errors?

I've this code i wrote that sets the array to 0
int arr[4];
memset(arr, 0, sizeof (arr));
Very simple, but how the code works without any errors even though sizeof(arr) = 16 (4 the array size * 4 for int) and the size i used when i declared the array is 4, How memset sets 16 bits to zero and the array i passed as a parameter has the size of 4?
I used memset(arr, 0, sizeof(arr)/sizeof(*arr)); to get the real size of the array and the result was accurate and it gives me 4 but how the above code works correctly?
memset sets 16 bytes (not bits) to 0. This is correct because the size of your array is 16 bytes, as you correctly stated (4 integers x 4 bytes per integer). sizeof knows the number of elements in your array and it knows the size of each element. As you can see in the docs, the third argument of memset takes the number of bytes, not the number of elements. http://www.cplusplus.com/reference/cstring/memset/
But be careful with using sizeof() where you pass array as int x[] or int* x. For example the following piece of code will not do what you expect:
void foo(int arr[]) {
auto s = sizeof(arr); // be careful! this won't do what you expect! it will return the size of pointer to array, not the size of the array itself
...
}
int a[10];
foo(a);
The third parameter is number of bytes. Which is 4*4=16 for your case.
memset
Actually the first solution is the correct one.
The function memset takes as third parameter the number of bytes to set to zero.
num:
Number of bytes to be set to the value.
sizeof returns the number of bytes occupied by the expression.
In your case sizeof(arr) = 16 which is exactly yhe number of bytes requested by memset function.
Your second solution:
memset(arr, 0, sizeof(arr)/sizeof(*arr)); // Note that: sizeof(arr)/sizeof(*arr) == 16 / 4 (generally) == 4 bytes
will set only the first 4 bytes to zero, that is the first integer of the array. So that solution is wrong if your intent is to set each element of the array to zero.

The operation of the sizeof operator in C++

On my MS VS 2015 compiler, the sizeof int is 4 (bytes). But the sizeof vector<int> is 16. As far as I know, a vector is like an empty box when it's not initialized yet, so why is it 16? And why 16 and not another number?
Furthermore, if we have vector<int> v(25); and then initialize it with int numbers, then still the size of v is 16 although it has 25 int numbers! The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
The size of each int is 4 so the sizeof v should then be 25*4 bytes seemingly but in effect, it is still 16! Why?
You're confusing sizeof(std::vector) and std::vector::size(), the former will return the size of vector itself, not including the size of elements it holds. The latter will return the count of the elements, you can get all their size by std::vector::size() * sizeof(int).
so why is it 16? And why 16 and not another number?
What is sizeof(std::vector) depends on implmentation, mostly implemented with three pointers. For some cases (such as debug mode) the size might increase for the convenience.
std::vector is typically a structure which contains two elements: pointer (array) of its elements and size of the array (number of elements).
As size is sizeof(void *) and the pointer is also sizeof(void *), the size of the structure is 2*sizeof(void *) which is 16.
The number of elements has nothing to do with the size as the elements are allocated on the heap.
EDIT: As M.M mentioned, the implementation could be different, like the pointer, start, end, allocatedSize. So in 32-bit environment that should be 3*sizeof(size_t)+sizeof(void *) which might be the case here. Even the original could work with start hardcoded to 0 and allocatedSize computed by masking end so really dependent on implementation. But the point remains the same.
sizeof is evaluated at compile time, so it only counts the size of the variables declared in the class, which probably includes a couple of counters and a pointer. It's what the pointer points to that varies with the size, but the compiler doesn't know about that.
The size can be explained using pointers which can be: 1) begin of vector 2) end of vector and 3) vector's capacity. So it would be more of like implementation dependent and it will change for different implementation.
You seem to be mixing "array" with "vector". If you have a local array, sizeof will provide the size of the array indeed. However, vector is not an array. It is a class, a container from STL guaranteeing that the memory contents are located within a single block of memory (that may get relocated, if vector grows).
Now, if you take a look at the std::vector implementation, you'll notice it contains fields (at least in MSVC 14.0):
size_type _Count = 0;
typename _Alloc_types::_Alty _Alval; // allocator object (from base)
_Mylast
_Myfirst
That could sum up to 16 bytes under your implementation (note: experience may vary).

fast way to make all elements in a 2d array becomes certain non-zero value in C++

Say I want to assign 5 to all the elements in a 2d array. First I tried memset
int a[3][4];
memset(a, 5, sizeof a);
and
int a[3][4];
memset(a, 5, sizeof(a[0][0])*3*4);
But the same result is all the elements becomes 84215045.
Then I tried with fill_n, it showed buildup failed. it seems fill_n cannot do with 2d array.
So is there any fast way to make all the elements in a 2d array to a certain value? in C++?
********************************************************************************
UPDATE
Thanks #paddy for the answer. Actually fill_n does work. The way I used it is like this, which fails to build up with my compiler.
fill_n(a,3*4,5);
#paddy's answer is correct, we can use it in this way for a 2d array.
fill_n(a[0],3*4,5);
Then I tried a little more, I found we can actually use this to deal with a 3d array, but it should be like this. Say for a[3][4][5].
fill_n(a[0][0],3*4*5,5);
Unfortunately, memset is only useful for setting every byte to a value. But that won't work when you want to set groups of bytes. Because of the memory layout of a 2D array, it's actually okay to use std::fill_n or std::fill from the first value:
std::fill_n( a[0], 3 * 4, 5 );
std::fill( a[0], a[3], 5 ); // Note a[3] is one past the end of array.
Depending on your compiler, something like this might even be vectorized for even faster execution. But even without that, you ought not to worry about speed -- std::fill is plenty fast.
sizeof(a) will give you the size of a pointer in byte, which will depend on the system it is running. On a 32 bit system or OS, it's 4 (32 bits = 4 bytes), on a 64 bits system/OS, it's 8.
to get the size of an int array of 3x4 elements, you should use sizeof(int)*(3*4).
So, you should use memset(a, 5, sizeof(int)*(3*4));

Confusion in Memory Addressing with Arrays

Lets have an array of type int:-
int arr[5];
Now,
if arr[0] is at address 100 then
Why do we have;
arr[1] at address 102 ,
arr[2] at address 104 and so on.
Instead of
arr[1] at address 101 ,
arr[2] at address 102 and so on.
Is it because an integer takes 2 bytes?
Does each memory block has 1 Byte capacity (whether it is 32 bit processor or 64 bit)?
Your first example is consistent with 16-bit ints.
As to your second example (&arr[0]==100, &arr[1]==101, &arr[2]==103), this can't possibly be a valid layout since the distance between consecutive elements varies between the first pair and the second.
It is because an integer takes 2 bytes?
Yes
Apparently on your system, int has the size of 2. On other systems, this might not be the case. Usually int is either sized 4 or 8 bytes, but other sizes are possible also.
You are right, on your machine the sizeof int is 2, so next possible value in the array will be 2 bytes away from the previous one.
-------------------------------
|100|101|102|103|104|105|106....
-------------------------------
arr[0] arr[1] arr[2]
There is no guaranty regarding size of int. C++ spec just says that sizeof(int) >= sizeof(char). It depends upon processor, compiler etc.
For more info try this

Understanding sizeof(char) in 32 bit C compilers

(sizeof) char always returns 1 in 32 bit GCC compiler.
But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???
Considering the following :
struct st
{
int a;
char c;
};
sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)
I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.
Can someone pls explain this???
I would be very thankful for any replies explaining it!!!
EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made.
Thanks to all those who made it a point that It must be changed especially #Mike Burton for downvoting the question and to #jalf who seemed to jump to conclusions over my understanding of concepts!!
sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.
The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.
char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.
There is a wikipedia article called Data structure alignment which has a good explanation and examples.
It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here
Sample code demonstrating structure alignment:
struct st
{
int a;
char c;
};
struct stb
{
int a;
char c;
char d;
char e;
char f;
};
struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};
std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb) << std::endl; //8
std::cout<<sizeof(stc) << std::endl; //12
The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.
First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.
Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).
Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).
All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.
On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.
A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.
An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.
In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.
The CPU can still address individual bytes, which is useful when dealing with chars, for example.
As for your example:
struct st
{
int a;
char c;
};
sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.
A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.
So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.
In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.
To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.
Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.
If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.
Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.
This is why sizeof(char) is 1.
ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.
Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.
It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.
Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.
Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.