Overlapping memory with sprintf(snprintf) - c++

Edit: What about if we had this
char value_arr[8];
// value_arr is set to some value
snprintf(value_arr, 8, "%d", *value_arr);
is this behavior defined?
Let's say for some ungainly reason I have
char value_arr[8];
// value_arr is set to some value
int* value_i = reinterpret_cast<int*>(value_arr);
snprintf(value_arr, 8, "%d", *value_i); // the behaviour in question
Is there a guarantee that, for example, if *value_i = 7, then value_arr will take on the value of "7". Is this behavior defined? Such that value_i is first dereferenced, then passed by value, and then formatted, then stored into the array.
Normally, the value of *value_i can be expected to not change, but storing the string into value_arr violates that.
It seems to function as expected when I test it, but I can't seem to find a definitive answer in the documentation. The function signature has ..., which to my knowledge has something to do with va_list, but I'm afraid I'm not very knowledgable on the workings of variadic functions.
int sprintf (char* str, const char* format, ... );

For the original code, evaluating the expression *value_i causes undefined behaviour by violating the strict aliasing rule. It is not permitted to alias a char array as int.
For the edited code, snprintf(value_arr, 8, "%d", *value_arr);, it is fine and will format the character code of the first character in the array. Evaluation of function arguments is sequenced-before entering the function. (C++17 intro.execution/11)

It's undefined behaviour; You use a pointer of type int* to point to an object of type char[8] with different / relaxed alignment requirements compared to int*. Dereferencing this pointer then yields UB.

The following can be found at https://en.cppreference.com/w/cpp/io/c/fprintf:
If a call to sprintf or snprintf causes copying to take place between objects that overlap, the behavior is undefined.
I would interpret your example to fall into this case and as such it would be classified as Undefined Behaviour, according to this page.
Edit: Some more details at https://linux.die.net/man/3/snprintf:
Some programs imprudently rely on code such as the following
sprintf(buf, "%s some further text", buf);
to append text to buf. However, the standards explicitly note that the results are undefined if source and destination buffers overlap when calling sprintf(), snprintf(), vsprintf(), and vsnprintf(). Depending on the version of gcc(1) used, and the compiler options employed, calls such as the above will not produce the expected results.

Related

C++ changing const variable through pointers [duplicate]

I never thought I will be going to ask this question but I have no idea why this happens.
const int a = 3;
int *ptr;
ptr = (int*)( &a );
printf( "A=%d\n", &a );
*ptr = 5;
printf( "A=%d\n", ptr );
printf( "A=%d\n", a );
printf( "A=%d\n", *ptr );
Output
A=6945404
A=6945404
A=3
A=5
How can this happen? How can one memory location hold two different values? I searched around and all I find is undefined behavior is undefined. Well that does not make any sense. There must be an explanation.
Edit
I get it, Marks answer makes alot of sense but still I wonder that const was added into the language so that user does not change the value unintentionally. I get that old compilers allows you to do that but I tried this on VS 2012 and I got the same behavior. Then again as haccks said, one memory location can't hold two values it looks like it does, then where is the second value stored?
The optimizer can determine that a is a constant value, and replace any reference to it with the literal 3. That explains what you see, although there's no guarantee that's what's actually happening. You'd need to study the generated assembly output for that.
Modifying a const variable through a non-const pointer results in undefined behavior. Most ikely the optimizer is substituting the original value in this line:
printf( "A=%d\n", a );
Look at the disassembly to verify this.
The C Standard, subclause 6.7.3, paragraph 6 [ISO/IEC 9899:2011], states:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
In fact your program invokes undefined behavior because of two reasons:
1.You are printing an address with wrong specifier %d. Correct specifier for that is %p.
2.You are modifying a variable with const specifier.
If the behavior is undefined then anything could happen. You may get either expected or unexpected result.
Standard says about it;
3.4.3 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
The problem is that the type of ptr is "pointer to int" not "pointer to const int".
You are then casting the address of 'a' (a const int) to be of type "pointer to int" and storing that address in ptr. The effect of this is that you are casting away the const-ness of a const variable.
This results in undefined behavior so your results may vary from compiler to compiler.
It is possible for the compiler to store 'a' in program ROM since it knows 'a' is a const value that can never be changed. When you lie to the compiler and cast away the const-ness of 'a' so that you can modify it through ptr, it may be invalid for ptr to actually modify the value of 'a' since that data may be stored in program ROM. Instead of giving you a crash, this compiler this time decided to point ptr to a different location with a different value this time. But anything could have happened since this behavior is undefined.

Pointer behaviour gives value as 1

To be clear, I am aware of pointer to pointer concept in C and of dereferencing double, triple pointers. The only doubt I have is in the following program which I wrote:
#include<stdio.h>
int main(){
int a;
int* p;
a=5;
p=&a;
int **q;
printf("*p=%d\n",*p);
printf("*q=%d\n",*q);
}
Now I know, the program is quite stupid and makes no sense, but that's not the problem. The question is WHY?
Why is the Output like this:
*p=5
*q=1
Why is *q=1 on every run?
Also it is to keep in mind, if I now declare a ***r;
And add the following line:
printf("*r=%d\n",*r);
Now the output is :
*p=5
*q=-40821602 //garbage
*r=1
Now, *r=1. WHY?
Same goes for ****s. In that case, *q,*r is a garbage and *s=1. Why?
Evaluation of an uninitialized pointer is undefined behavior, and that is what you did. cppreference states the following on undefined behavior:
Undefined Behavior:
undefined behavior - there are no restrictions on the behavior of the program. Examples of undefined behavior are memory accesses outside of array bounds, signed integer overflow, null pointer dereference, modification of the same scalar more than once in an expression without sequence points, access to an object through a pointer of a different type, etc. Compilers are not required to diagnose undefined behavior (although many simple situations are diagnosed), and the compiled program is not required to do anything meaningful.
Hence, you cannot expect anything meaningful coming out of your program. It may print 1, but it could also do anything else. In my case, for example, it simply crashed.
So the question "why?" is simply not a valid question.

Weird output when use prefix and postfix on pointer together

Given the code below
char buf[] = "asfsf";
char *a=buf;
++*a++;
cout<<*a;
I expect the result is the next character of 's' that is 't', but the result is still 's'. Why?
Why ++*a++ is not the same as
*a++;
++*a;
cout<<*a;
Is that really a duplicate question with ++i++? I know ++i++ is a undefined behavior and will cause compile error, but ++*i++ actually can run. Is my case also a undefined behavior?
According to the language grammar, the operators associate as:
++(*a++)
Note: associativity does not imply an order of operations.
*a++ evaluates to an lvalue designating the location where a was originally pointing, with side-effect of modifying a. All fine so far.
Applying prefix-++ to that lvalue increments the value stored there (changing 'a' to 'b').
Although the two increments are unsequenced, this does not cause UB because different objects are being incremented, and the lvalue designating the latter location does not depend on the increment. (It uses the old value of a).
As it stands right now, your code has undefined behavior, because it attempts to modify the contents of a string literal.
One way (probably the preferred way) to prevent the compiler from accepting such code is to define your a like:
char const *a="asfsf";
This way, the ++*a part simply won't compile.
For the sake of exposition, let's change the code a little bit, to become:
#include <iostream>
int main(){
char x[]="asfsf";
char *a = x;
++*a++;
std::cout<<x;
}
Now a points at memory we can actually write to, and get meaningful results. This prints out bsfsf. If we print out a, we'll get sfsf.
What's happening is that a++ increments a, but still yields the original value of a. That is dereferenced, giving a reference to the first element of x. Then the pre-increment is applied to that, changing it from a to b.
If you want to increment the pointer, dereference the result, then increment that, you'd use: ++*++a;. Well, no, you wouldn't use that--or at least I hope you wouldn't. It does increment a to point at the second element of the array, then increment that second element to change it from s to t--but anybody who read the code would be completely forgiven if they hated you for writing it that way.

Incrementing character in string literal

#include<stdio.h>
int main(){
char *ptr="Helio";
ptr++;
printf("%s\n",ptr);
//*ptr++;
printf("%c\n",++*ptr);/*Segmentation fault on GCC*/
return 0;
}
Q1) This works fine in Turbo C++ but on GCC it gives segmentation fault. I am not getting the exact reason.
May be operator precedence is one of the reason.
Q2) Do each compiler has different operator precedence?
As I can see here ++ has higher precedence than dereference operator. May be GCC and Turbo C++ treats them differently.
No, the operator precedence is defined by the C standard, all the compiler follows the same one.
The reason of difference result of Turbo C++ and GCC in this case is because you modified the string literal, which is undefined behavior.
Change it to:
char arr[] = "Helio";
char *ptr = arr;
and you can modify the content of the string now. Note that arr itself is the array name and cannot be modified, so I added a new pointer variable ptr and initialize it to point to the first element of the array.
In your last printf() line, the expression ++*ptr is equivalent to ++ptr[0], which is, in turn, equivalent to ptr[0] = ptr[0]+1. Since ptr[0]=='H', you are trying to change the value of ptr[0] to 'I'.
That's the key problem there. Since &ptr[0] points to the first element of the constant "Helio", the attempt to change the first character, H, is giving trouble, because it is Undefined Behaviour.
char* p = "some literal";
This is only legal because of a smelly argument that C-people fought over during standard comitee negociations. You should consider it as an oddity that exists for backward compatibility.
This is the message you get with GCC:
warning: deprecated conversion from string constant to 'char*'
Please next time, write the following:
char const* p = "some literal";
And make it a reflex in your coding habits. Then you would not have been able to compile your faulty line.
which is:
++*ptr
Here you are taking the first character of the constant literal and try to increment it, to what comes after H, therefore I. But this memory zone happens to be in a write protected page, because this is a constant. This is very much undefined by standard and you should consider it illegal. Your segfault comes from here.
I suggest you run your program in valgrind next time to get more elaborate error messages.
In the answer that Yu Hao wrote for you, what is happenning is that all the characters gets copied one by one, from the constant string pool where the literal are stored, to a stack-allocated char array, by a code that the compiler writes at the initialization/declaration site, therefore you can dereference its content.

Can I safely create references to possibly invalid memory as long as I don't use it?

I want to parse UTF-8 in C++. When parsing a new character, I don't know in advance if it is an ASCII byte or the leader of a multibyte character, and also I don't know if my input string is sufficiently long to contain the remaining characters.
For simplicity, I'd like to name the four next bytes a, b, c and d, and because I am in C++, I want to do it using references.
Is it valid to define those references at the beginning of a function as long as I don't access them before I know that access is safe? Example:
void parse_utf8_character(const string s) {
for (size_t i = 0; i < s.size();) {
const char &a = s[i];
const char &b = s[i + 1];
const char &c = s[i + 2];
const char &d = s[i + 3];
if (is_ascii(a)) {
i += 1;
do_something_only_with(a);
} else if (is_twobyte_leader(a)) {
i += 2;
if (is_safe_to_access_b()) {
do_something_only_with(a, b);
}
}
...
}
}
The above example shows what I want to do semantically. It doesn't illustrate why I want to do this, but obviously real code will be more involved, so defining b,c,d only when I know that access is safe and I need them would be too verbose.
There are three takes on this:
Formally
well, who knows. I could find out for you by using quite some time on it, but then, so could you. Or any reader. And it's not like that's very practically useful.
EDIT: OK, looking it up, since you don't seem happy about me mentioning the formal without looking it up for you. Formally you're out of luck:
N3280 (C++11) §5.7/5 “If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Two situations where this can produce undesired behavior: (1) computing an address beyond the end of a segment, and (2) computing an address beyond an array that the compiler knows the size of, with debug checks enabled.
Technically
you're probably OK as long as you avoid any lvalue-to-rvalue conversion, because if the references are implemented as pointers, then it's as safe as pointers, and if the compiler chooses to implement them as aliases, well, that's also ok.
Economically
relying needlessly on a subtlety wastes your time, and then also the time of others dealing with the code. So, not a good idea. Instead, declare the names when it's guaranteed that what they refer to, exists.
Before going into the legality of references to unaccessible memory, you have another problem in your code. Your call to s[i+x] might call string::operator[] with a parameter bigger then s.size(). The C++11 standard says about string::operator[] ([string.access], §21.4.5):
Requires: pos <= size().
Returns: *(begin()+pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.
This means that calling s[x] for x > s.size() is undefined behaviour, so the implementation could very well terminate your program, e.g. by means of an assertion, for that.
Since string is now guaranteed to be continous, you could go around that problem using &s[i]+x to get an address. In praxis this will probably work.
However, strictly speaking doing this is still illegal unfortunately. The reason for this is that the standard allows pointer arithmetic only as long as the pointer stays inside the same array, or one past the end of the array. The relevant part of the (C++11) standard is in [expr.add], §5.7.5:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
Therefore generating references or pointers to invalid memory locations might work on most implementations, but it is technically undefined behaviour, even if you never dereference the pointer/use the reference. Relying on UB is almost never a good idea , because even if it works for all targeted systems, there are no guarantees about it continuing to work in the future.
In principle, the idea of taking a reference for a possibly illegal memory address is itself perfectly legal. The reference is only a pointer under the hood, and pointer arithmetic is legal until dereferencing occurs.
EDIT: This claim is a practical one, not one covered by the published standard. There are many corners of the published standard which are formally undefined behaviour, but don't produce any kind of unexpected behaviour in practice.
Take for example to possibility of computing a pointer to the second item after the end of an array (as #DanielTrebbien suggests). The standard says overflow may result in undefined behaviour. In practice, the overflow would only occur if the upper end of the array is just short of the space addressable by a pointer. Not a likely scenario. Even when if it does happen, nothing bad would happen on most architectures. What is violated are certain guarantees about pointer differences, which don't apply here.
#JoSo If you were working with a character array, you can avoid some of the uncertainty about reference semantics by replacing the const-references with const-pointers in your code. That way you can be certain no compiler will alias the values.