Related to strings - c++

//SECTION I:
void main()
{
char str[5] = "12345"; //---a)
char str[5] = "1234"; //---b)
cout<<"String is: "<<str<<endl;
}
Output: a) Error: Array bounds Overflow.
b) 1234
//SECTION II:
void main()
{
char str[5];
cout<<"Enter String: ";
cin>>str;
cout<<"String is: "<<str<<endl;
}
I tried with many different input strings, and to my surprise, I got strange result:
Case I: Input String: 1234, Output: 1234 (No issue, as this is expected behavior)
Case II: Input String: 12345, Output: 12345 (NO error reported by compiler But I was expecting an Error: Array bounds Overflow.)
Case III: Input String: 123456, Output: 123456 (NO error reported by compiler But I was expecting an Error: Array bounds Overflow.)
.................................................
.................................................
Case VI: Input String: 123456789, Output: 123456789(Error: unhandeled exception. Access Violation.)
My doubt is, When I assigned more characters than its capacity in SECTION I, compiler reported ERROR: Array bounds Overflow.
But, when I am trying the same thing in SECTION II, I am not geting any errors. WHY it is so ?? Please note: I executed this on Visual Studio

char str[5] = "12345";
This is a compiletime error. You assign a string of length 6 (mind the appended null-termination) to an array of size 5.
char str[5];
cin>>str;
This may yield a runtime error. Depending on how long the string is you enter, the buffer you provide (size 5) may be too small (again mind the null-termination).
The compiler of course can't check your user input during runtime. If you're lucky, you're notified for an access violation like this by Segmentation Faults. Truly, anything can happen.
Throwing exceptions on access violations is not mandatory. To address this, you can implement array boundary checking yourself, or alternatively (probably better) there are container classes that adapt their size as necessary (std::string):
std::string str;
cin >> str;

What you are seeing is an undefined behavior. You are writing array out of bounds, anything might happen in that case ( (including seeing the output you expect).

I tried with many different input strings, and to my surprise, I got
strange result:
This phenomena is called Undefined behavior (UB).
As soon as you enter more characters than a char array can hold, you invite UB.
Sometime, it may work, it may not work sometime, it may crash. In short, there is no definite pattern.
[side note: If a compiler allows void main() to get compiled then it's not standard compliant.]

It's because in the second case, the compiler can't know. Only at runtime, there is a bug. C / C++ offer no runtime bounds checking by default, so the error is not recognized but "breaks" your program. However, this breaking doesn't have to show up immediately but while str points to a place in memory with only a fixed number of bytes reserved, you just write more bytes anyways.
Your program's behavior will be undefined. Usually, and also in your particular case, it may continue working and you'll have overwritten some other component's memeory. Once you write so much to access forbidden memory (or free the same memory twice), your program crashes.

char str[5] = "12345" - in this case you didn't leave room to the null terminator. So when the application tries to print str, it goes on and on as long as a null is not encountered, eventually stepping on foridden memory and crashing.
In the `cin case, the cin operation stuffed 0 at the end of the string, so this stops the program from going too far in memory.
Hoever, from my experience, when breaking the rules with memory overrun, things go crazy, and looking for a reason why in one case it works while in other it doesn't, doesn't lead anywhere. More often than not, the same application (with a memory overrun) can work on one PC and crash on another, due to different memory states on them.

because compiler does static check. In your section II, size is unknown in advance. It depends on your input length.
May i suggest you to use STL string ?

Related

cout prints char[] containing more characters than set length? [duplicate]

This question already has answers here:
What is a buffer overflow and how do I cause one?
(12 answers)
Closed 5 years ago.
#include<iostream>
using namespace std;
int main(void)
{
char name[5];
cout << "Name: ";
cin.getline(name, 20);
cout << name;
}
Output:
Name: HelloWorld
HelloWorld
Shouldn't this give an error or something?
Also when I write an even longer string,
Name: HelloWorld Goodbye
HelloWorld Goodbye
cmd exits with an error.
How is this possible?
Compiler: G++ (GCC 7), Nuwen
OS: Windows 10
It's called buffer overflow and is a common source of code bugs and exploits. It's the developers responsibility to ensure it doesn't happen. character strings wil be printed until they reach the first '\0' character
The code produces "undefined behavior". This means, anything might happen. In your case, the program works unexpectedly. It might however do something completely different with different compiler flags or on a different system.
Shouldn't this give an error or something.
No. The compiler cannot know that you will input a long string, thus there cannot be any compiler error. You also don't throw any runtime exception here. It is up to you to make sure the program can handle long strings.
Your code has encountered UB, also known as undefined behaviour, which, as Wikipedia defines, the result of executing computer code whose behavior is not prescribed by the language specification to which the code adheres. It usually occurs when you do note define variables properly, in this case a too small char array.
Even -Wall flag will not give any warning. So you can use tools like valgrind and gdb to detect memory leaks and buffer overflows
You can check those questions:
Array index out of bound in C
No out of bounds error
They have competent answers.
My short answer, based on those already given in the questions I posted:
Your code implements an Undefined Behavior(buffer overflow) so it doesn't give an error when you run it once. But some other time it may give. It's a chance thing.
When you enter a longer string, you actually corrupt the memory (stack) of the program (i.e you overwrite the memory which should contain some program-related data, with your data) and so the return code of your program ends up being different than 0 which interprets as an error. The longer the string, the higher the chance of screwing things up (sometimes even short strings screw things up)
You can read more here: https://en.wikipedia.org/wiki/Buffer_overflow

Memory in a char[]

So I was working in c++ 11 and this occurred to me:
char s[100];
strcpy(s, "Andrei");
int n=strlen(Andrei); // (it would be 6)
I do this:
s[n]=' ';
s[n+9]='\0';
What happens with s[n+1], ... s[n+8]?
If I go
cout <<s;
The array will be displayed as normal.
But if I specify
cout <<s[n+2]
it will print odd characters
The array elements s[n+1] ... s[n+8] are uninitialized. They are said to have indeterminate value. Trying to output an indeterminate value causes undefined behaviour which means anything can happen.
So, you have a buffer of 100 chars. Due the fact there is no strict initialization policy in C++, you got a buffer of BIG SOMETHING. Content of that "big something" would differ between each app launch, may vary on different build modes (debug/release) etc. Indeed, querying that "something" is like trying to listen a radio on a random wave and hear a noise sequence in return (it could be different each time you do this).
Running your strcpy would initialize just first N chars, the rest of the buffer is still "something".

Value of variable changes

In the code below, I am trying to store the value of array at index 0 in the temp variable. In this line of code: a[i-1]=a[i]-a[i-1]; when i=0, a[i-1] becomes a[-1].
Why is compiler not giving any error?
Why does the value of temp variable is affected and becomes zero after the first iteration, though it is assigned a value only when i=0 and temp is not used anywhere else?
For example, when I gave input as:
3 1 2 3
Output:
i:0
a[0]: 1
TEMP: 1
TEMP: 0
TEMP: 0
TEMP: 0
What's actually happening? Please explain with reference to the working of compiler. I know that if I put a condition if(i!=0) a[i-1]=a[i]-a[i-1]; the code will work normally. But I want to know why is this happening with the given scenario.
#include<bits/stdc++.h>
using namespace std;
int main()
{
int a[10],i,n,temp;
cin>>n;
for(i=0;i<n;i++){
cin>>a[i];
if(i==0){
temp=a[i];
cout<<"i: "<<i<<endl;
cout<<"a[0]: "<<a[i]<<endl;
}
cout<<"TEMP: "<<temp<<endl;
a[i-1]=a[i]-a[i-1];
}
cout<<endl<<"TEMP: "<<temp;
}
Why is compiler not giving any error?
The compiler is not required to give any error. Accessing array out of bounds has undefined behaviour. It might seem obvious that the array is going to be accessed out of bounds at run time (then again, perhaps not so obvious since the author of the program didn't catch it before running the program), but it would be prohibitively expensive for the compiler to check execution paths in search for bugs in general.
Why does the value of temp variable is affected and becomes zero after the first iteration, though it is assigned a value only when i=0 and temp is not used anywhere else?
Because the behaviour of the program is undefined.
It is usually pointless to analyze why program behaves in some way, when it is allowed to behave in any possible way. However, this case it seems that most likely: When you write out of bounds, you overwrite some memory that isn't part of the array. Other variables may be located in the memory that isn't part of the array. Therefore overwriting some memory that isn't part of the array may corrupt the value of some other variable. That is what you observed.
Why is compiler not giving any error?
As #user2079303 already mentioned, such error is too difficult for compiler to find. There're separate tools for finding such more complicated errors - static and dynamic code analyzers. For example, if you use default static code analyzer from VS 2015, it gives the following warnings:
Severity Code Description
Warning C4701 potentially uninitialized local variable 'temp' used
Warning C6385 Reading invalid data from 'a': the readable size is '40' bytes, but '-4' bytes may be read.
Warning C6386 Buffer overrun while writing to 'a': the writable size is '40' bytes, but '-4' bytes might be written.
Warning C6001 Using uninitialized memory 'temp'.
If you want extra safety for your projects, consider enabling all (or almost all) available warnings and run code analysis regularly.

Why am I not getting a segmentation fault with this code? (Bus error)

I had a bug in my code that went like this.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Ideally this should give a segfault. However, I saw this give a bus error.
Wikipedia says something to the order of 'Bus error is when the program tries to access an unaligned memory location or when you try to access a physical (not virtual) memory location that does not exist or is not allowed. '
The second part of the above statement sounds similar to a seg fault. So my question is, when do you get a SIGBUS and when a SIGSEGV?
EDIT:-
Quite a few people have mentioned the context. I'm not sure what context would be needed but this was a buffer overflow lying inside a static class function that get's called from a number of other class functions. If there's something more specific that I can give which will help, do ask.
Anyways, someone had commented that I should simply write better code. I guess the point of asking this question was "can an application developer infer anything from a SIGBUS versus a SIGSEGV?" (picked from that blog post below)
As you probably realize, the base cause is undefined behavior in your
program. In this case, it leads to an error detected by the hardware,
which is caught by the OS and mapped to a signal. The exact mapping
isn't really specified (and I've seen integral division by zero result
in a SIGFPE), but generally: SIGSEGV occurs when you access out of
bounds, SIGBUS for other accessing errors, and SIGILL for an illegal
instruction. In this case, the most likely explination is that your
bounds error has overwritten the return address on the stack. If the
return address isn't correctly aligned, you'll probably get a SIGBUS,
and if it is, you'll start executing whatever is there, which could
result in a SIGILL. (But the possibility of executing random bytes as
code is what the standards committee had in mind when they defined
“undefined behavior”. Especially on machines with no memory
protection, where you could end up jumping directly into the OS.)
A segmentation fault is never guaranteed when you're doing fishy stuff with memory. It all depends on a lot of factors (how the compiler lays out the program in memory, optimizations etc).
What may be illegal for a C++ program may not be illegal for a program in general. For instance the OS doesn't care if you step outside an array. It doesn't even know what an array is. However it does care if you touch memory that doesn't belong to you.
A segmentation fault occurs if you try to do a data access a virtual address that is not mapped to your process. On most operating systems, memory is mapped in pages of a few kilobytes; this means that you often won't get a fault if you write off the end of an array, since there is other valid data following it in the memory page.
A bus error indicates a more low-level error; a wrongly-aligned access or a missing physical address are two reasons, as you say. However, the first is not happening here, since you're dealing with bytes, which have no alignment restriction; and I think the second can only happen on data accesses when memory is completely exhausted, which probably isn't happening.
However, I think you might also get a bus error if you try to execute code from an invalid virtual address. This could well be what is happening here - by writing off the end of a local array, you will overwrite important parts of the stack frame, such as the function's return address. This will cause the function to return to an invalid address, which (I think) will give a bus error. That's my best guess at what particular flavour of undefined behaviour you are experiencing here.
In general, you can't rely on segmentation faults to catch buffer overruns; the best tool I know of is valgrind, although that will still fail to catch some kinds of overrun. The best way to avoid overruns when working with strings is to use std::string, rather than pretending that you're writing C.
In this particular case, you don't know what kind of garbage you have in the format string. That garbage could potentially result in treating the remaining arguments as those of an "aligned" data type (e.g. int or double). Treating an unaligned area as an aligned argument definitely causes SIGBUS on some systems.
Given that your string is made up of two other strings each being a max of 20 characters long, yet you are putting it into a field that is 25 characters, that is where your first issue lies. You are have a good potential to overstep your bounds.
The variable desc should be at least 41 characters long (20 + 20 + 1 [for the space you insert]).
Use valgrind or gdb to figure out why you are getting a seg fault.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Just by looking at this code, I can assume that name and address each can be 20 chars long. If that is so, then does it not imply that desc should be minimum 20+20+1 chars long? (1 char for the space between name and address, as specified in the sprintf).
That can be the one reason of segfault. There could be other reasons as well. For example, what if name is longer than 20 chars?
So better you use std::string:
std::string name;
std::string address;
std::string desc = name + " " + address;
char const *char_desc = desc.str(); //if at all you need this

c++ what happens if you print more characters with sprintf, than the char pointer has allocated?

I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.