Memset Definition and use - c++

What's the usefulness of the function memset()?.
Definition: Sets the first num bytes of the block of memory pointed by ptr to the
specified value (interpreted as an unsigned char).
Does this mean it hard codes a value in a memory address?
memset(&serv_addr,0,sizeof(serv_addr) is the example that I'm trying to understand.
Can someone please explain in a VERY simplified way?

memset() is a very fast version of a relatively simple operation:
void* memset(void* b, int c, size_t len) {
char* p = (char*)b;
for (size_t i = 0; i != len; ++i) {
p[i] = c;
}
return b;
}
That is, memset(b, c, l) set the l bytes starting at address b to the value c. It just does it much faster than in the above implementation.

memset() is usually used to initialise values. For example consider the following struct:
struct Size {
int width;
int height;
}
If you create one of these on the stack like so:
struct Size someSize;
Then the values in that struct are going to be undefined. They might be zero, they might be whatever values happened to be there from when that portion of the stack was last used. So usually you would follow that line with:
memset(&someSize, 0, sizeof(someSize));
Of course it can be used for other scenarios, this is just one of them. Just think of it as a way to simply set a portion of memory to a certain value.

memset is a common way to set a memory region to 0 regardless of the data type. One can say that memset doesn't care about the data type and just sets all bytes to zero.
IMHO in C++ one should avoid doing memset when possible since it circumvents the type safety that C++ provides, instead one should use constructor or initialization as means of initializing. memset done on a class instance may also destroy something unintentionally:
e.g.
class A
{
public:
shared_ptr<char*> _p;
};
a memset on an instance of the above would not do a reference counter decrement properly.

I guess that serv_addr is some local or global variable of some struct type -perhaps struct sockaddr- (or maybe a class).
&serv_addr is taking the address of that variable. It is a valid address, given as first argument to memset. The second argument to memset is the byte to be used for filling (zero byte). The last argument to memset is the size, in bytes, of that memory zone to fill, which is the size of that serv_addr variable in your example.
So this call to memset clears a global or local variable serv_addr containing some struct.
In practice, the GCC compiler, when it is optimizing, will generate clever code for that, usually unrolling and inlining it (actually, it is often a builtin, so GCC can generate very clever code for it).

It is nothing but setting the memory to particular value.
Here is example code.
Memset(const *p,unit8_t V,unit8_t L) , Here the P is the pointer to target memory, V is the value to the target buffer which will be set to a value V and l is the length of the data.
while(L --> 0)
{
*p++ = V;
}

memset- set bytes in memory
Synopsis-
#include<string.h>
void *memset(void *s,int c,size_t n)
Description- The memset() function shall copy c (converted to an unsigned char) into each of the first n bytes of the object pointed to by s.
Here for the above function , the memset() shall return s value.

Related

Why pointer can avoid the warning Warrary-bounds

For the code(Full demo) like:
#include <iostream>
struct A
{
int a;
char ch[1];
};
int main()
{
volatile A *test = new A;
test->a = 1;
test->ch[0] = 'a';
test->ch[1] = 'b';
test->ch[2] = 'c';
test->ch[3] = '\0';
std::cout << sizeof(*test) << std::endl
<< test->ch[0] << std::endl;
}
I need to ignore the compilation warning like
warning: array subscript 1 is above array bounds of 'volatile char 1' [-Warray-bounds]
which is raised by gcc8.2 compiler:
g++ -O2 -Warray-bounds=2 main.cpp
A method to ignore this warning is to use pointer to operate the four bytes characters like:
#include <iostream>
struct A
{
int a;
char ch[1];
};
int main()
{
volatile A *test = new A;
test->a = 1;
// Use pointer to avoid the warning
volatile char *ptr = test->ch;
*ptr = 'a';
*(ptr + 1) = 'b';
*(ptr + 2) = 'c';
*(ptr + 3) = '\0';
std::cout << sizeof(*test) << std::endl
<< test->ch[0] << std::endl;
}
But I can not figure out why that works to use pointer instead of subscript array. Is it because pointer do not have boundary checking for which it point to? Can anyone explain that?
Thanks.
Background:
Due to padding and alignment of memory for struct, though ch[1]-ch[3] in struct A is out of declared array boundary, it is still not overflow from memory view
Why don't we just declare the ch to ch[4] in struct A to avoid this warning?
Answer:
struct A in our app code is generated by other script while compiling. The design rule for struct in our app is that if we do not know the length of an array, we declare it with one member, place it at the end of the struct, and use another member like int a in struct A to control the array length.
Due to padding and alignment of memory for struct, though ch[1]
– ch[3] in struct A is out of declared array boundary, it is
still not overflow for memory view, so we want to ignore this warning.
C++ does not work the way you think it does. You are triggering undefined behavior. When your code triggers undefined behavior, the C++ standard places no requirement on its behavior. A version of GCC attempts to start some video games when certain kind of undefined behavior is encountered. Anthony Williams also knows at least one case where a particular instance of undefined behavior caused someone's monitor to catch on fire. (C++ Concurrency in Action, page 106) Your code may appear to be working at this very time and situation, but that is just an instance of undefined behavior and you cannot count on it. See Undefined, unspecified and implementation-defined behavior.
The correct way to suppress this warning is to write correct C++ code with well-defined behavior. In your case, declaring ch as char ch[4]; solves the problem.
The standard specifies this as undefined behavior in [expr.add]/4:
When an expression J that has integral type is added to or
subtracted from an expression P of pointer type, the result has the
type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]),78 the expressions P +
J and J + P (where J has the value j) point to the
(possibly-hypothetical) array element i + j of x if
0 ≤ i + j ≤ n and the
expression P - J points to the (possibly-hypothetical) array element
i − j of x if 0 ≤ i − j ≤ n.
Otherwise, the behavior is undefined.
78) An object that is not an array element is
considered to belong to a single-element array for this purpose; see
[expr.unary.op]. A pointer past the last element of an array x of
n elements is considered to be equivalent to a pointer to a hypothetical array element n for this purpose; see
[basic.compound].
I want to avoid the warning like
warning: array subscript 1 is above array bounds of 'volatile char 1' [-Warray-bounds]
Well, it is probably better to fix the warning, not just avoid it.
The warning is actually telling you something: what you are doing is undefined behavior. Undefined behavior is really bad (it allows your program to literally anything!) and should be fixed.
Let's look at your struct again:
struct A
{
int a;
char ch[1];
};
In C++, your array has only one element in it. The standard only guarantees array elements of 0 through N-1, where N is the size of the array:
[dcl.array]
...If the value of the constant expression is N, the array
has N elements numbered 0 to N-1...
So ch only has the elements 0 through 1-1, or elements 0 through 0, which is just element 0. That means accessing ch[1], ch[2] overruns the buffer, which is undefined behavior.
Due to padding and alignment of memory for struct, though ch1-ch3 in struct A is out of declared array boundary, it is still not overflow for memory view, so we want to ignore this warning.
Umm, if you say so. The example you gave only allocated 1 A, so as far as we know, there is still only space for the 1 character. If you do allocate more than 1 A at a time in your real program, then I suppose this is possible. But that's still probably not a good thing to do. Especially since you might run into int a of the next A if you're not careful.
A solution to ignore this warning is to use pointer...But I can not figure out why that works. Is it because pointer do not have boundary checking for which it point?
Probably. That would be my guess too. Pointers can point to anything (including destroyed data or even nothing at all!), so the compiler probably won't check it for you. The compiler may not even have a way of knowing whether the memory you point to is valid or not (or may just not care), and, thus, may not even have a way to warn you, much less will warn you. Its only choice is to trust you, so I'm guessing that's why there's no warning.
Why don't we just declare the ch to ch4 in struct A to avoid this warning?
Side issue: actually std::string is probably a better choice here if you don't know how many characters you want to store in here ahead of time--assuming it's different for every instance of A. Anyway, moving on:
Why don't we just declare the ch to ch4 in struct A to avoid this warning?
Answer:
struct A in our app code is generated by other script while compiling. The design rule for struct in our app is that if we do not know the length of an array, we declare it with one member, place it at the end of the struct, and use another member like int a in struct A to control the array length.
I'm not sure I understand your design principle completely, but it sounds like std::vector might be a better option. Then, size is kept track of automatically by the std::vector, and you know that everything is stored in ch. To access it, it would be something like:
myVec[i].ch[0]
I don't know all your constraints for your situation, but it sounds like a better solution instead of walking the line around undefined behavior. But that's just me.
Finally, I should mention that if you are still really interested in ignoring our advice, then I should mention that you still have the option to turn off the warning, but again, I'd advise not doing that. It'd be better to fix A if you can, or get a better use strategy if you can't.
There really is no way to work with this cleanly in C++ and iirc the type (a dynamically sized struct) isn't actually properly formed in C++. But you can work with it because compilers still try to preserve compatibility with C. So it works in practice.
You can't have a value of the struct, only references or pointers to it. And they must be allocated by malloc() and released by free(). You can't use new and delete. Below I show you a way that only allows you to allocate pointers to variable sized structs given the desired payload size. This is the tricky bit as sizeof(Buf) will be 16 (and not 8) because Buf::buf must have a unique address. So here we go:
#include <cstddef>
#include <cstdint>
#include <stdlib.h>
#include <new>
#include <iostream>
#include <memory>
struct Buf {
size_t size {0};
char buf[];
[[nodiscard]]
static Buf * alloc(size_t size) {
void *mem = malloc(offsetof(Buf, buf) + size);
if (!mem) throw std::bad_alloc();
return std::construct_at(reinterpret_cast<Buf*>(mem), AllocGuard{}, size);
}
private:
class AllocGuard {};
public:
Buf(AllocGuard, size_t size_) noexcept : size(size_) {}
};
int main() {
Buf *buf = Buf::alloc(13);
std::cout << "buffer has size " << buf->size << std::endl;
}
You should delete or implement the assign/copy/move constructors and operators as desired. A another good idea would be to use std::uniq_ptr or std::shared_ptr with a Deleter that calls free() instead of returning a naked pointer. But I leave that as exercise to the reader.

Pascal and Delphi Arrays to C/C++ Arrays

In pascal and delphi, arrays have their lengths stored at some offset in memory from the array's pointer. I found that the following code works for me and it gets the length of an array:
type PInt = ^Integer; //pointer to integer.
Function Length(Arr: PInt): Integer;
var
Ptr: PInt;
Begin
Ptr := Arr - sizeof(Integer);
Result := Ptr^ + 1;
End;
Function High(Arr: PInt): Integer; //equivalent to length - 1.
Begin
Result := (Arr - sizeof(Integer))^;
End;
I translated the above code into C++ and it thus becomes:
int Length(int* Arr)
{
int* Ptr = (Arr - sizeof(int));
return *reinterpret_cast<char*>(Ptr) + 1;
}
int High(int* Arr)
{
return *(Arr - sizeof(int));
}
Now assuming the above are equivalent to the Pascal/Delphi versions, how can I write a struct to represent a Pascal Array?
In other words, how can I write a struct such that the following is true:
Length(SomeStructPointer) = SomeStructPointer->size
I tried the following:
typedef struct
{
unsigned size;
int* IntArray;
} PSArray;
int main()
{
PSArray ps;
ps.IntArray = new int[100];
ps.size = 100;
std::cout<<Length((int*) &ps); //should print 100 or the size member but it doesn't.
delete[] ps.IntArray;
}
In Pascal and Delphi, arrays have their lengths stored at
some offset in memory from the array's pointer.
This is not so. The entire premise of your question is wrong. The Delphi functions you present do not work in general. They might work for dynamic arrays. But it is certainly not the case that you can pass an pointer to an array and be sure that the length is stored before it.
And in fact the Delphi code in the question does not even work for dynamic arrays. Your pointer arithmetic is all wrong. You read a value 16 bytes to the left rather than 4 bytes. And you fail to check for nil. So it's all a bit of a disaster really.
Moving on to your C++ code, you are reaping the result of this false premise. You've allocated an array. There's no reason to believe that the int to the left of the array holds the length. Your C++ code is also very broken. But there's little point attempting to fix it because it can never be fixed. The functions you define cannot be implemented. It is simply not the case that an array is stored adjacent to a variable containing the length.
What you are looking for in your C++ code is std::vector. That offers first class support for obtaining the length of the container. Do not re-invent the wheel.
If interop is your goal, then you need to use valid interop types. And Delphi managed dynamic arrays do not qualify. Use a pointer to an array, and a separately passed length.
Why? I can see no good reason to do this. Use idiomatic Pascal in Pascal, use idiomatic C++ in C++. Using sizeof like that also ignores padding, and so your results may vary from platform to platform.
If you want a size, store it in the struct. If you want a non-member length function, just write one that works with the way you wrote the struct. Personally, I suggest using std::array if the size won't change and std::vector if it will. If you absolutely need a non-member length function, try this:
template<typename T>
auto length(const T& t) -> decltype(t.size()) {
return t.size();
}
That will work with both std::array and std::vector.
PS: If you're doing this for "performance reasons", please profile your code and prove that there is a bottleneck before doing something that will become a maintenance hazard.

C++ Call by Ref. with a dynamic sized struct without knowing its size

I need to use a function (part of an API) which stores some requested data into a dynamic sized struct using call by reference. The struct is defined as follows - it concerns access control lists of either posix or NFS4 version, but that is just the use case, I guess.
typedef struct my_struct
{
unsigned int len; /* total length of the struct in bytes */
... /* some other static sized fields */
unsigned int version; /* 2 different versions are possible */
unsigned int amount; /* number of entries that follow */
union {
entry_v1_t entry_v1[1];
entry_v2_t entry_v2[1];
};
} my_struct_t;
There are 2 versions of the entries and I know which one I will obtain (v1). Both entry_v1_t and entry_v2_t are fixed (but different) sized structs just containing integers (so I guess they are not worth being explained here). Now I need to use an existing function to fill my structure with the information I need using Call by Reference, the signature is as follows, including the comments - I don't have access to the implementation:
int get_information(char *pathname, void *ptr);
/* The ptr parameter must point to a buffer mapped by the my_struct
* structure. The first four bytes of the buffer must contain its total size.
*/
So the point is, that I must allocate memory for that struct but don't know for how much entries (and, as consequence, the total size) I must allocate. Have you ever dealt with such a situation?
Under Windows API there are many such functions, you normally call them with some NULL pointer to get size of the buffer, then call again with allocated buffer. In case during next call size of buffer have changed function returns error and you need allocate again. So you do it in a while loop till function returns with success.
So your get_information must implement somehow such mechanisms, either it returns false if buffer is to small or returns its correct size if ptr is NULL. But that is just my guess.
OK I thing I figured out how it works. Thanks for your ideas and notes. I declared a my_struct pointer and allocated minimum space for the fixed sized fields (5) before the dynamic array => 5 * sizeof(unsigned int). Invoking get_information with that pointer returns -1 and sets errno = 28 and strerror(errno) = "No space left on device".
But, it sets the my_struct->len field to the required size and that seems to be the answer to my question - how should you know? No I can invoke get_information initially with the minimum space and figure out how much I need to allocate, and afterwards call it again with the right sized memory allocated to get the information successfully.
The loop solution seems to make sense anyway and would have been my next try - since there are usually just a few entries in that dynamic array.
Thank you.

Size of an Array.... in C/C++?

Okay so you have and array A[]... that is passed to you in some function say with the following function prototype:
void foo(int A[]);
Okay, as you know it's kind of hard to find the size of that array without knowing some sort of ending variable or knowing the size already...
Well here is the deal though. I have seem some people figure it out on a challenge problem, and I don't understand how they did it. I wasn't able to see their source code of course, that is why I am here asking.
Does anyone know how it would even be remotely possible to find the size of that array?? Maybe something like what the free() function does in C??
What do you think of this??
template<typename E, int size>
int ArrLength(E(&)[size]){return size;}
void main()
{
int arr[17];
int sizeofArray = ArrLength(arr);
}
The signature of that function is not that of a function taking an array, but rather a pointer to int. You cannot obtain the size of the array within the function, and will have to pass it as an extra argument to the function.
If you are allowed to change the signature of the function there are different alternatives:
C/C++ (simple):
void f( int *data, int size ); // function
f( array, sizeof array/sizeof array[0] ); // caller code
C++:
template <int N>
void f( int (&array)[N] ); // Inside f, size N embedded in type
f( array ); // caller code
C++ (though a dispatch):
template <int N>
void f( int (&array)[N] ) { // Dispatcher
f( array, N );
}
void f( int *array, int size ); // Actual function, as per option 1
f( array ); // Compiler processes the type as per 2
You cannot do that. Either you have a convention to signal the end of the array (e.g. that it is made of non-zero integers followed by a 0), or you transmit the size of the array (usually as an additional argument).
If you use the Boehm garbage collector (which has a lot of benefit, in particular you allocate with GC_malloc and friends but you don't care about free-ing memory explicitly), you could use the GC_size function to give you the size of a GC_malloc-ed memory zone, but standard malloc don't have this feature.
You're asking what we think of the following code:
template<typename E, int size>
int ArrLength(E(&)[size]){return size;}
void main()
{
int arr[17];
int sizeofArray = ArrLength(arr);
}
Well, void main has never been standard, neither in C nor in C++.
It's int main.
Regarding the ArrLength function, a proper implementation does not work for local types in C++98. It does work for local types by C++11 rules. But in C++11 you can write just end(a) - begin(a).
The implementation you show is not proper: it should absolutely not have int template argument. Make that a ptrdiff_t. For example, in 64-bit Windows the type int is still 32-bit.
Finally, as general advice:
Use std::vector and std::array.
One relevant benefit of this approach is that it avoid throwing away the size information, i.e. it avoids creating the problem you're asking about. There are also many other advantages. So, try it.
The first element could be a count, or the last element could be a sentinel. That's about all I can think of that could work portably.
In new code, for container-agnostic code prefer passing two iterators (or pointers in C) as a much better solution than just passing a raw array. For container-specific code use the C++ containers like vector.
No you can't. Your prototype is equivalent to
void foo(int * A);
there is obviously no size information. Also implementation dependent tricks can't help:
the array variable can be allocated on the stack or be static, so there is no information provided by malloc or friends
if allocated on the heap, a user of that function is not forced to call it with the first element of an allocation.
e.g the following are valid
int B[22];
foo(B);
int * A = new int[33];
foo(A + 25);
This is something that I would not suggest doing, however if you know the address of the beginning of the array and the address of the next variable/structure defined, you could subtract the address. Probably not a good idea though.
Probably an array allocated at compile time has information on its size in the debug information of the executable. Moreover one could search in the code for all the address corresponding to compile time allocated variables and assume the size of the array is minus the difference between its starting address and the next closest starting address of any variable.
For a dinamically allocated variable it should be possible to get its size from the heap data structures.
It is hacky and system dependant, but it is still a possible solution.
One estimate is as follows: if you have for instance an array of ints but know that they are between (stupid example) 0..80000, the first array element that's either negative or larger than 80000 is potentially right past the end of the array.
This can sometimes work because the memory right past the end of the array (I'm assuming it was dynamically allocated) won't have been initialized by the program (and thus might contain garbage values), but might still be part of the allocated pages, depending on the size of the array. In other cases it will crash or fail to provide meaningful output.
All of the other answers are probably better, i.e. you either have to pass the length of the array or terminate it with a special byte sequence.
The following method is not portable, but it works for me in VS2005:
int getSizeOfArray( int* ptr )
{
int size = 0;
void* ptrToStruct = ptr;
long adr = (long)ptrToStruct;
adr = adr - 0x10;
void* ptrToSize = (void*)adr;
size = *(int*)ptrToSize;
size /= sizeof(int);
return size;
}
This is entirely dependent of the memory model of your compiler and system so, again, it is not portable. I bet there are equivalent methods for other platforms. I would never use this in a production environment, merely stating this as an alternative.
You can use this: int n = sizeof(A) / sizeof(A[0]);

Setting pointers to structs

I have the following struct:
struct Datastore_T
{
Partition_Datastores_T cmtDatastores; // bytes 0 to 499
Partition_Datastores_T cdhDatastores; // bytes 500 to 999
Partition_Datastores_T gncDatastores; // bytes 1000 to 1499
Partition_Datastores_T inpDatastores; // bytes 1500 1999
Partition_Datastores_T outDatastores; // bytes 2000 to 2499
Partition_Datastores_T tmlDatastores; // bytes 2500 to 2999
Partition_Datastores_T sm_Datastores; // bytes 3000 to 3499
};
I want to set a char* to point to a struct of this type like so:
struct Datastore_T datastores;
// Elided: datastores is initialized with data here
char* DatastoreStartAddr = (char*)&datastores;
memset(DatastoreStartAddr, 0, 3500);
The problem I have is that DatastoreStartAddr always has a value of zero when it should point to the struct that has been initialized with data.
What am I doing wrong?
Edit: What I mean by zero is that the "values" in the structure are all zeros even after I initialize the structure. The address is not zero, it is the values in the struct that are zero.
Edit: I think I am asking the question wrong. Let's start over. If I have a struct that is initialized with data, and another object maintains a field member that is a pointer to that struct, if the struct is changed directly:
struct Datastore_T datastores;
char* DatastoreStartAddr = (char*)&datastores;
datastores.cmtDatastores.u16Region[0] = Scheduler.GetMinorFrameCount(); // byte 40,41
datastores.cmtDatastores.u16Region[1] = Scheduler.GetMajorFrameCount(); // byte 42,43
Shouldn't I be able to access these changes using the DatastoreStartAddr pointer?
EDIT: The following code tries to read the data set in datastores, but using the pointer to the struct:
CMT_UINT8_Tdef PayLoadBuffer[1500]= {NULL};
int TDIS = 0;
int DIS = 0;
int DSA = 0;
//copy DataStore info using address and size offsets
if ((PayLoadBuffer + TDIS + DIS) < IndvDEMMax)
{
memcpy((PayLoadBuffer + TDIS), Datastores+DSA, DIS);
TDIS += DIS;
}
In the memcpy((PayLoadBuffer + TDIS), Datastores+DSA, DIS) line, Datastores should point to structure and attempts to access an offset in that structure. But since the value is always zero, it copies zero in the PayLoadBuffer.
I don't know why you are getting an address of zero, but I would guess the code you don't show has something to do with it. Some other points:
Consider using an array of Partition_Datastores_T inside your struct
Do not use magic numbers for struct sizes, you want sizeof(Datastore_T )
There is no need for the intermediate char*
Edit: Bobby, to answer your supplementary question - yes you should be able to access it through a pointer, but not through a char * (without jumping through some hoops). You want:
struct Datastore_T datastores;
struct Datastore_T * DatastoreStartAddr = &datastores;
and when you use that pointer:
DatastoreStartAddr->cmtDatastores.u16Region[0] = Scheduler.GetMinorFrameCount();
Please note the use of the -> operator.
i just tested your code and it is not zero. Try to post bigger piece
You are doing it in the wrong order - should be like this
struct Datastore_T datastores;
char* DatastoreStartAddr = (char*)&datastores;
memset(DatastoreStartAddr, 0, sizeof(Datastore_T));
// Elided: datastores is initialized with data here
Now datastores is still initialized. And like everyone else says you might want this instead
struct Datastore_T * DatastoreStartAddr = datastores;
memset((void *)DatastoreStartAddr, 0, sizeof(Datastore_T));
I'm assuming that the code you are not showing is correct. It might be wise to show it to us for scrutiny. Likely, what I say below is not the problem at all.
Because of the cast, it might be that you're having aliasing issues here. If you have set aggressive compiler optimization flags (e.g. -fno-strict-aliasing on gcc), the compiler would assume that those two pointers can never refer to the same thing, because they have different types. Then either or both of the representations might be cached in different CPU registers, so updates to one would never be reflected in the other.
Again, this is a long shot. Considering the size of your struct (I didn't see that when I started answering your duplicate question), it is very unlikely that it would reside anywhere else but in main memory. But you could try turn down your compiler optimizations and see if it makes a difference.
At what point is your structure's values zero? If it's before the cast and memset(), the problem is with your initialization. If it is after the cast and memset(), then the values in your structure are zero because memset() overwrote with 0's the values you had initialized it with. The values in datastores should also be zero after the memset().