memset difference? - c++

I did read the usage of memset on msdn and on cplusplus.com, I know that(please correct me if im wrong):
int p =3;
// p = object value
// &p = memory address where p is stored
so what is the difference of:
char szMain[512];
memset( szMain, 0x61, sizeof( szMain ) );
cout << szMain[4];
and:
char szMain[512];
memset( &szMain, 0x61, sizeof( szMain ) );
cout << szMain[4];
(0x61 = a, ASCII table hex)
why both have same behavior? Please forgive me if this isn't a constructive question.
I'm somewhat of a newbie on c++ and i can't seem to understand.

What is happening is described in the standard as an array-to-pointer conversion.
4.2 Array-to-pointer conversion [conv.array]
An lvalue or rvalue of type “array of N T” or “array of unknown bound
of T” can be converted to an rvalue of type “pointer to T”. The result
is a pointer to the first element of the array.
The above says that in your first call to memset szMain is converted to a pointer (more specifically a pointer to char) storing the address of the first element of your array.
In the second call, &szMain will yield a pointer of type char (*)[512], ie. a pointer to an array which has the same type as szMain, storing the address of the array itself.
Both of these expressions will yield the same exact value since the start of szMain is located at the same address as the first element of it (szMain[0]), but please not that they are not of the same type.
memset accepts a void* as the first argument, therefore any pointer type can be used implicitly to invoke the function.

szMain is the identifier of an array. In most contexts, array identifiers decay to become a pointer to the first element in the array, in other words &szMain[0].
So in your first example, you're passing memset the address of the first element of the array. In the second example, you're passing it the address of the array itself. However, these are exactly the same address.

Yes, in this case the two have the same behavior, but that's not necessarily always the case.
Most uses of an array name give the address of the first element of the array. For an array of T, the type of the pointer will be T *.
If you explicitly take the address of the array instead, you get the same address, but a different type ("pointer to array of N objects of type T" instead of "Pointer to T").
In this case, the address you pass is converted to a void *, so that difference in type is immediately discarded -- but if you use them in another context that doesn't immediately discard the type difference, they won't necessarily have the same effect. Just for example, if you wanted to leave the first element of the array un-initialized, you might do something like:
memset(szMain+1, 'a', sizeof(szMain));
Since the type there is (in this case) "pointer to char", that will start from the second char in the array, as desired. If, however, you did:
memset(&szMain+1, 'a', sizeof(szMain));
Instead of adding 1, it'll add the size of one array, so you'll be passing the address just after the end of the array, instead of the the address of the second element.
As an aside: if you want 'a', I'd advise using it directly as I have above, rather than encoding the value in hex as you did in the question. At the very least, it's a lot more readable. It's theoretically more portable too, though few people really care about the machines (primarily IBM mainframes) where the hex value won't work correctly.

Related

Calculate array length via pointer arithmetic

I was wondering how *(&array + 1) actually works. I saw this as an easy way to calculate the array length and want to understand it properly before using it. I'm not very experienced with pointer arithmetic, but with my understanding &array gives the address of the first element of the array. (&array + 1) would go to end of the array in terms of address. But shouldn't *(&array + 1) give the value, which is at this address. Instead it prints out the address. I would really appreciate your help to get the pointer stuff clear in my head.
Here is the simple example I'm working on:
int numbers[] = {5,8,9,3,4,6,1};
int length = *(&numbers + 1) - numbers;
(This answer is for C++.)
&numbers is a pointer to the array itself. It has type int (*)[7].
&numbers + 1 is a pointer to the byte right after the array, where another array of 7 ints would be located. It still has type int (*)[7].
*(&numbers + 1) dereferences this pointer, yielding an lvalue of type int[7] referring to the byte right after the array.
*(&numbers + 1) - numbers: Using the - operator forces both operands to undergo the array-to-pointer conversion, so pointers can be subtracted. *(&numbers + 1) is converted to an int* pointing at the byte after the array. numbers is converted to an int* pointing at the first byte of the array. Their difference is the number of ints between the two pointers---which is the number of ints in the array.
Edit: Although there's no valid object pointed to by &numbers + 1, this is what's called a "past the end" pointer. If p is a pointer to T, pointing to a valid object of type T, then it's always valid to compute p + 1, even though *p may be a single object, or the object at the end of an array. In that case, you get a "past the end" pointer, which does not point to a valid object, but is still a valid pointer. You can use this pointer for pointer arithmetic, and even dereference it to yield an lvalue, as long as you do not try to read or write through that lvalue. Note that you can only go one byte past-the-end of an object; attempting to go any further leads to undefined behaviour.
The expression &numbers gives you the address of the array, not the first member (although numerically they are the same). The type of this expression is int (*)[7], i.e. a pointer to an array of size 7.
The expression &numbers + 1 adds sizeof(int[7]) bytes to the address of array. The resulting pointer points right after the array.
The problem however is when you then dereference this pointer with *(&numbers + 1). Dereferencing a pointer that points one element past the end of an array invokes undefined behavior.
The proper way to get the number of elements of an array is sizeof(numbers)/sizeof(numbers[0]). This assumes that the array was defined in the current scope and is not a parameter to a function.
but with my understanding &array gives the address of the first element of the array.
This understanding is misleading. &array gives the address of the array. Sure, the value of that address is the same same as the first element, but the type of the expression is different. The type of the expression &array is "pointer to array of N elements of type T" (where N is the length that you're looking for and T is int).
But shouldn't *(&array + 1) give the value, which is at this address.
Well yes... but it's here that the type of the expression becomes important. Indirecting a pointer to an array (rather than pointer to an element of the array) will result in the array itself.
In the subtraction expression, both array operands decay into pointer to first element. Since the subtraction uses decayed pointers, the unit of the pointer arithmetic is in terms of the element size.
I saw this as an easy way to calculate the array length
There are easier ways:
std::size(numbers)
And in C:
sizeof(numbers)/sizeof(numbers[0])

Are pointers arrays?

Here is the code I'm having trouble to understand:
char* myPtr = "example";
myPtr[1] = 'x';
How am I allowed to use myPtr[1]? Why can I choose positions like a do on arrays? myPtr is not even an array.
Obs. I know about lookup table, literal pooling and string literals, my concern is just how this even compile. I don't use pointers that much.
Can anyone help?
Apparently you made an assumption that applicability of [] operator to something necessarily implies that that "something" is an array. This is not true. The built-in [] operator has no direct relation to arrays. The [] is just a shorthand for a combination of * and + operators: by definition a[b] means *(a + b), where one operand is required to be a pointer and another is required to be an integer.
Moreover, when you apply the [] operator to an actual array, that array gets implicitly converted to a pointer type first, and only then the resultant pointer can act as an operand of [] operator. This actually means the opposite of what you supposedly assumed initially: operator [] never works with arrays. By the time we get to the [] the array has already decayed to a pointer.
As a related side-note, this latter detail manifests itself in one obscure peculiarity of the first C language standard. In C89/90 the array-to-pointer conversion was not allowed for rvalue arrays, which also prevented the [] operator from working with such arrays
struct S { int a[10]; };
struct S foo(void) { struct S s = { 0 }; return s; }
int main()
{
foo().a[5];
/* ERROR: cannot convert array to pointer, and therefore cannot use [] */
return 0;
}
C99 expanded the applicability of that conversion thus making the above code valid.
It compiles according to §5.2.1/1 [expr.sub] of the C++ standard:
A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall have the type “array of T” or “pointer to T” and the other shall have unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
Since "example" has type char const[8] it may decay to char const* (it used to decay to char* as well, but it's mostly a relict of the past) which makes it a pointer.
At which point the expression myPtr[1] becomes *(myPtr + 1) which is well defined.
Pointers hold the address of memory location of variables of specific data types they are assigned to hold. As others have pointed out its counter-intuitive approach take a bit of learning curve to understand.
Note that the string "example" itself is immutable however, the compiler doesn't prevent the manipulation of the pointer variable, whose new value is changed to address of string 'x' (this is not same as the address of x in 'example'),
char* myPtr = "example";
myPtr[1] = 'x';
Since myPtr is referencing immutable data when the program runs it will crash, though it compiles without issues.
From C perspective, here, you are dereferencing a mutable variable.
By default in C, the char pointer is defined as mutable, unless specifically stated as immutable through keyword const, in which case the binding becomes inseparable and hence you cannot assign any other memory address to the pointer variable after defining it.
Lets say your code looked like this,
const char *ptr ="example";
ptr[1] = 'x';
Now the compilation will fail and you cannot modify the value as this pointer variable is immutable.
You should use char pointer only to access the individual character in a string of characters.
If you want to do string manipulations then I suggest you declare an int to store each character's ASCII values from the standard input output like mentioned here,
#include<stdio.h>
int main()
{
int countBlank=0,countTab=0,countNewLine=0,c;
while((c=getchar())!=EOF)
{
if(c==' ')
++countBlank;
else if(c=='\t')
++countTab;
else if(c=='\n')
++countNewLine;
putchar(c);
}
printf("Blanks = %d\nTabs = %d\nNew Lines = %d",countBlank,countTab,countNewLine);
}
See how the integer takes ASCII values in order to get and print individual characters using getchar() and putchar().
A special thanks to Keith Thompson here learnt some useful things today.
The most important thing to remember is this:
Arrays are not pointers.
But there are several language rules in both C and C++ that can make it seem as if they're the same thing. There are contexts in which an expression of array type or an expression of pointer type is legal. In those contexts, the expression of array type is implicitly converted to yield a pointer to the array's initial element.
char an_array[] = "hello";
const char *a_pointer = "goodbye";
an_array is an array object, of type char[6]. The string literal "hello" is used to initialize it.
a_pointer is a pointer object, of type const char*. You need the const because the string literal used to initialize it is read-only.
When an expression of array type (usually the name of an array object) appears in an expression, it is usually implicitly converted to a pointer to its initial (0th) element. So, for example, we can write:
char *ptr = an_array;
an_array is an array expression; it's implicitly converted to a char* pointer. The above is exactly equivalent to:
char *ptr = &(an_array[0]); // parentheses just for emphasis
There are 3 contexts in which an array expression is not converted to a pointer value:
When it's the operand of the sizeof operator. sizeof an_array yields the size of the array, not the size of a pointer.
When it's the operand of the unary & operator. &an_array yields the address of the entire array object, not the address of some (nonexistent) char* pointer object. It's of type "pointer to array of 6 chars", or char (*)[6].
When it's a string literal used as an initializer for an array object. In the example above:
char an_array[] = "hello";
the contents of the string literal "hello" are copied into an_array; it doesn't decay to a pointer.
Finally, there's one more language rule that can make it seem as if arrays were "really" pointer: a parameter defined with an array type is adjusted so that it's really of pointer type. You can define a function like:
void func(char param[10]);
and it really means:
void func(char *param);
The 10 is silently ignored.
The [] indexing operator requires two operands, a pointer and an integer. The pointer must point to an element of an array object. (A standalone object is treated as a 1-element array.) The expression
arr[i]
is by definition equivalent to
*(arr + i)
Adding an integer to a pointer value yields a new pointer that's advanced i elements forward in the array.
Section 6 of the comp.lang.c FAQ has an excellent explanation of all this stuff. (It applies to C++ as well as to C; the two languages have very similar rules in this area.)
In C++, your code generates a warning during compile:
{
//char* myPtr = "example"; // ISO C++ forbids converting a string
// constant to ‘char*’ [-Wpedantic]
// instead you should use the following form
char myPtr[] = "example"; // a c-style null terminated string
// the myPtr symbol is also treated as a char*, and not a const char*
myPtr[1] = 'k'; // still works,
std::cout << myPtr << std::endl; // output is 'ekample'
}
On the other hand, std::string is much more flexible, and has many more features:
{
std::string myPtr = "example";
myPtr[1] = 'k'; // works the same
// then, to print the corresponding null terminated c-style string
std::cout << myPtr.c_str() << std::endl;
// ".c_str()" is useful to create input to system calls requiring
// null terminated c-style strings
}
The semantics of abc[x] is "Add x*sizeof(type)" to abc where abc is any memory pointer. Arrays variable behave like memory pointers and they just point to beginning of the memory location allocated to array.
Hence adding x to array or pointer variable both will point to memory which is same as variable pointing to + x*sizeof(type which array contains or pointer points to, e.g. in case of int pointers or int array it's 4)
Array variables are not same as pointer as said in comment by Keith as array declaration will create fix sized memory block and any arithmetic on that will use size of array not the element types in that array.

Is array name a constant pointer in C++?

I have a question about the array name a
int a[10]
How is the array name defined in C++? A constant pointer? It is defined like this or just we can look it like this? What operations can be applied on the name?
The C++ standard defines what an array is and its behaviour. Take a look in the index. It's not a pointer, const or otherwise, and it's not anything else, it's an array.
To see a difference:
int a[10];
int *const b = a;
std::cout << sizeof(a); // prints "40" on my machine.
std::cout << sizeof(b); // prints "4" on my machine.
Clearly a and b are not the same type, since they have different sizes.
In most contexts, an array name "decays" to a pointer to its own first element. You can think of this as an automatic conversion. The result is an rvalue, meaning that it's "just" a pointer value, and can't be assigned to, similar to when a function name decays to a function pointer. Doesn't mean it's "const" as such, but it's not assignable.
So an array "is" a pointer much like a function "is" a function pointer, or a long "is" an int. That is to say, it isn't really, but you can use it as one in most contexts thanks to the conversion.
An array name is not a constant pointer - however it acts like one in so many contexts (it converts to one on sight pretty much) that for most purposes it is.
From 6.3.2.1/3 "Other operands/Lvalues, arrays,and function designators":
Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue.

c++: function arg char** is not the same as char*[]

I am using g++. I am using code that had a main(int,char**), renamed so I can call it. I looked at Should I use char** argv or char* argv[] in C?, where char** is said to be equivalent to char* []. This does not appear to be true in c++ function calls. For example:
void f1(char** p){;}
void f2(char* p[]){
f1(p);
//...`
}
fails with the compiler complaining "cannot convert char (*)[] to char**..." The references I look to say that arrays are converted to pointers for the call, but this does not seem to be the case as:
void f3(char* [] p);
char caa[16][16];
f3(caa);
also fails. I had assumed that as long as the levels of indirection were the same (e.g. char*** ptr and char[][][] carray ) the types were interchangeable.
Can someone provide a reference I can review that clarifies these issues?
Thanks.
This still holds true in C++. If your compiler complains as you describe for your first case, it is non-conformant.
To explain your second case, it is important to understand what actually happens. An expression of array type is implicitly convertible to a corresponding pointer type, i.e.: T[n] -> T*. However, if T itself is an array, this case isn't treated specially, and array-to-pointer decay does not propagate. So T*[n] decays to T**, but T[x][y] will only decay to T[y]*, and no further.
From implementation perspective this makes sense, because decaying further, if allowed, would give T**, which is pointer to pointer; whereas 2D C arrays aren't implemented as jagged arrays (i.e. array of pointers to arrays) - they form a single contiguous memory block. So, there's no T* "inside" the array to take an address of to give you a T**. For the allowed cases, a typical implementation simply takes the address of the array as a whole and converts it to type of pointer to single element (when underlying pointer representation is the same for all types, as is usually the case, this convertion is a no-op at run time).
The normative reference here is ISO C++03, 4.2[conv.array]/1:
An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to an rvalue of type “pointer to T.” The result is a pointer to the first element of the array.
void f2(char* p[]){
the compiler complaining "cannot convert char (*)[] to char**..."
Strange. char(*)[] is a pointer to array of chars, but in your code snippet the function has char *p[] argument, what means array of pointers to char! These types are indeed different (because array elements have different sizes), let alone your code snippet perfectly compiles. You really have misspelled something.
Sherlock Holmes mode: or is there a typedef involved? ;-)
void f1(char** p){;}
typedef char type[];
void f2(type * p){
f1(p);
}
This really doesn't compile and yields the error you referred to.

Semantics of char a[]

I recently embarrassed myself while explaining to a colleague why
char a[100];
scanf("%s", &a); // notice a & in front of 'a'
is very bad and that the slightly better way to do it is:
char a[100];
scanf("%s", a); // notice no & in front of 'a'
Ok. For everybody getting ready to tell me why scanf should not be used anyway for security reasons: ease up. This question is actually about the meaning of "&a" vs "a".
The thing is, after I explained why it shouldn't work, we tried it (with gcc) and it works =)). I ran a quick
printf("%p %p", a, &a);
and it prints the same address twice.
Can anybody explain to me what's going on?
Well, the &a case should be obvious. You take the address of the array, exactly as expected.
a is a bit more subtle, but the answer is that a is the array. And as any C programmer knows, arrays have a tendency to degenerate into a pointer at the slightest provocation, for example when passing it as a function parameter.
So scanf("%s", a) expects a pointer, not an array, so the array degenerates into a pointer to the first element of the array.
Of course scanf("%s", &a) works too, because that's explicitly the address of the array.
Edit: Oops, looks like I totally failed to consider what argument types scanf actually expects. Both cases yield a pointer to the same address, but of different types. (pointer to char, versus pointer to array of chars).
And I'll gladly admit I don't know enough about the semantics for ellipsis (...), which I've always avoided like the plague, so looks like the conversion to whichever type scanf ends up using may be undefined behavior. Read the comments, and litb's answer. You can usually trust him to get this stuff right. ;)
Well, scanf expects a char* pointer as the next argument when seeing a "%s". But what you give it is a pointer to a char[100]. You give it a char(*)[100]. It's not guaranteed to work at all, because the compiler may use a different representation for array pointers of course. If you turn on warnings for gcc, you will see also the proper warning displayed.
When you provide an argument object that is an argument not having a listed parameter in the function (so, as in the case for scanf when has the vararg style "..." arguments after the format string), the array will degenerate to a pointer to its first element. That is, the compiler will create a char* and pass that to printf.
So, never do it with &a and pass it to scanf using "%s". Good compilers, as comeau, will warn you correctly:
warning: argument is incompatible with corresponding format string conversion
Of course, the &a and (char*)a have the same address stored. But that does not mean you can use &a and (char*)a interchangeably.
Some Standard quotes to especially show how pointer arguments are not converted to void* auto-magically, and how the whole thing is undefined behavior.
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object. (6.3.2.1/3)
So, that is done always - it isn't mentioned below explicitly anymore when listening valid cases when types may differ.
The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments. (6.5.2.2/7)
About how va_arg behaves extracting the arguments passed to printf, which is a vararg function, emphasis added by me (7.15.1.1/2):
Each invocation of the va_arg macro modifies ap so that the
values of successive arguments are returned in turn. The parameter type shall be a type
name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
one type is a signed integer type, the other type is the corresponding unsigned integer
type, and the value is representable in both types;
one type is pointer to void and the other is a pointer to a character type.
Well, here is what that default argument promotion is:
If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions. (6.5.2.2/6)
It's been a while since I programmed in C but here's my 2c:
char a[100] doesn't allocate a separate variable for the address of the array, so the memory allocation looks like this:
---+-----+---
...|0..99|...
---+-----+---
^
a == &a
For comparison, if the array was malloc'd then there is a separate variable for the pointer, and a != &a.
char *a;
a = malloc(100);
In this case the memory looks like this:
---+---+---+-----+---
...| a |...|0..99|...
---+---+---+-----+---
^ ^
&a != a
K&R 2nd Ed. p.99 describes it fairly well:
The correspondence between indexing
and pointer arithmetic is very close.
By definition, the value of a variable
or expression of type array is the
address of element zero of the array.
Thus after the assignment pa=&a[0];
pa and a have identical values. Since
the name of the array is a synonym for
the location of the initial element,
the assignment pa=&a[0] can also be
written as pa=a;
A C array can be implicitly converted to a pointer to its first element (C99:TC3 6.3.2.1 §3), ie there are a lot of cases where a (which has type char [100]) will behave the same way as &a[0] (which has type char *). This explains why passing a as argument will work.
But don't start thinking this will always be the case: There are important differences between arrays and pointers, eg regarding assignment, sizeof and whatever else I can't think of right now...
&a is actually one of these pitfalls: This will create a pointer to the array, ie it has type char (*) [100] (and not char **). This means &a and &a[0] will point to the same memory location, but will have different types.
As far as I know, there is no implicit conversion between these types and they are not guaranteed to have a compatible representation as well. All I could find is C99:TC3 6.2.5 §27, which doesn't says much about pointers to arrays:
[...] Pointers to other types need not have the same representation or alignment requirements.
But there's also 6.3.2.3 §7:
[...] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
So the cast (char *)&a should work as expected. Actually, I'm assuming here that the lowest addressed byte of an array will be the lowest addressed byte of its first element - not sure if this is guaranteed, or if a compiler is free to add arbitrary padding in front of an array, but if so, that would be seriously weird...
Anyway for this to work, &a still has to be cast to char * (or void * - the standard guarantees that these types have compatible representations). The problem is that there won't be any conversions applied to variable arguments aside from the default argument promotion, ie you have to do the cast explicitly yourself.
To summarize:
&a is of type char (*) [100], which might have a different bit-representation than char *. Therefore, an explicit cast must be done by the programmer, because for variable arguments, the compiler can't know to what it should convert the value. This means only the default argument promotion will be done, which, as litb pointed out, does not include a conversion to void *. It follows that:
scanf("%s", a); - good
scanf("%s", &a); - bad
scanf("%s", (char *)&a); - should be ok
Sorry, a tiny bit off topic:
This reminded me of an article I read about 8 years ago when I was coding C full time. I can't find the article but I think it was titled "arrays are not pointers" or something like that. Anyway, I did come across this C arrays and pointers FAQ which is interesting reading.
char [100] is a complex type of 100 adjacent char's, whose sizeof equals to 100.
Being casted to a pointer ((void*) a), this variable yields the address of the first char.
Reference to the variable of this type (&a) yields address of the whole variable, which, in turn, also happens to be the address of the first char