Why Does Array With 1 element Allow 2k elements? [duplicate]

Why Does Array With 1 element Allow 2k elements? [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why don’t i get “Segmentation Fault”?
Why does this code work? If the first element only hold the first characer, then where are the rest of characters being stored? And if this is possible, why aren't we using this method?
Notice line 11: static char c[1]. Using one element, you can store as much characters as you want. I use static to keep the memory location alive outside of the function when pointing to it later.
#include <stdio.h>
void PutString( const char* pChar ){
for( ; *pChar != 0; pChar++ )
{
putchar( *pChar );
}
}
char* GetString(){
static char c[1];
int i = 0;
do
{
c[i] = getchar();
}while( c[i++] != '\n' );
c[i] = '\0';
return c;
}
void main(){
PutString( "Enter some text: " );
char* pChar = GetString();
PutString( "You typed the following:\n" );
PutString( pChar );
}

C doesn't check for array boundaries, so no error is thrown. However, characters after the first one will be stored in memory not allocated by the program. If the string is short, this may work, but a long enough string will corrupt enough memory to crash the process.

You can write wherever you want:
char *bad = 0xABCDEF00;
bad[0] = 'A';
But you shouldn't. Who knows what the above lines of code will do? In the very best case, your program will crash. In the worst case, you've corrupted memory and won't find out until much later. (And good luck tracking down the source!)
To answer your specific questions, it doesn't "work". The rest of the characters are stored directly after the array.

You are just very (un)lucky, that you are not overwriting some other data structures. The array definitely cannot store as much characters as you want - sooner or later you either silently corrupt your memory (in the worse case), or hit a segfault by accesing a memory your process hasn't mapped. The fact that it works is likely because the compiler didn't place any other data after your c[1]. Just try to add a second array, let's say static char d[1]; after c, and then try reading from it - you'll see the second character from c.

C++ does not do bounds checking on arrays. That's for performance reasons; checking every array index to see if it's outside the bounds would incur an unacceptable runtime overhead. Avoiding overhead has always been a design goal of C++.
If you want bounds checking, you should use std::vector instead, which does provides it as an optional feature through std::vector::at().

In this case the behavior is undefined: according to the compiler / the current state of the memory / ..., it may seems to run fine, it may write corrupted chars, or it may crash because of a sefault.
Linking against Electric Fence or running in valgrind may help to find such errors at runtime.

Related

For-Loop on Pointer: Does it move the whole address range?

I have this bit of code:
int count_x(char* p, char x)
{
if (p == nullptr) return 0;
int count = 0;
for (; *p != 0; p++)
{
if (*p == x)
count++;
}
return count;
}
The input is in this case a char-array:
char v[] = { "Ich habe einen Beispielsatz erstellt!"};
Since I am currently looking into CPP with the book "C++ the programming language - 4th Edition" I got the code from there and am currently trying to figure it out.
When stepping through it, I noticed that the for loop moves the memory address in increments of one. This is not very surprising to me, but the following question arose and I couldn't find an answer yet:
Does this loop reduce the overall memory range or is the whole range being moved?
Since to my knowlege you use a "block" in whole for storing such a char-array (or any type of array), I guess it is the later since I don't see anything reducing the boundries.
But with that "knowledge" I have to ask: Doesn't this cause major issues as it would be theoretically possible to read parts of the memory the programm shouldn't have access to?
Will this be something I have to keep in mind when dealing with (very) long arrays?

You are not moving anything at all. That's where your confusion comes from. Your code is perfectly safe for very very long strings, don't worry. (Apart that count may overflow...).
You are right that p is incremented in every iteration of the loop, but that doesn't mean that anything is being moved. You are only modifying the value of the pointer p (and count). That's it. Effectively, you are traversing your RAM.
You are right however that you might read in memory that you don't own, but that is the callers fault because count_x's preconditions require that you pass in a null-terminated string, and if you don't, well, you get undefined behavior for accessing memory you don't own. That's why you should use std::string instead of char*, which is guaranteed to be null-terminated (if you use C++11 or higher).

In C and C++, "strings like this" are implicitly nul-terminated. That means they end with a char whose value is 0 or '\0' (same thing).
So this loop:
for (; *p != 0; p++)
advances p until it reaches a point where *p is 0 -- the end of the string.
If p does not point within a nul-terminated buffer or string, the loop will indeed move over memory beyond the end of the memory buffer it started in. This kind of error is common and relying on strings being nul-terminated results (indirectly) in a lot of buffer overruns, security holes, and generally crashing and memory corrupting programs.
To get around this, C++ offers alternative ways to store and interact with strings of characters, including std::string. These do not rely on the properly positioned nul terminator to work, although much C-style code they interact with may.
And in C++17, string view provides a non-owning low-cost way to refer to a bounded size string with no nul terminator.

No problem with (very)long array. Because count_x(), you pass the address of the first char of array. So no overhead for any length of array. You can you the index with point to traverse on all the array. Look like
unsigned int count_x(char* p, char x)
{
if (p == nullptr) return 0;
unsigned int count = 0;
for (unsigned int i=0; i<strlen(p); i++)
{
if (p[i] == x)
count++;
}
return count;
}

char* issue in C++ [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Why this fragment of code does not work? I know that all entered strings have length less than 20 symbols.I do not use std::string because I want to learn how to use char*
#include <map>
#include <stdio.h>
using namespace std;
map <char*,int> cnt;
int main()
{
char ans[20];
int n,mx = 0;
scanf("%d\n",&n);
for ( int i = 1; i <= n; i++){
char str[20];
gets(str);
cnt[str]++;
}
for ( auto i = cnt.begin(); i != cnt.end(); i++ )
puts(i->first);
}

Let's be clear that your code has a lot of undefined behavior. I tried running your code and here is what I saw on my machine. You should tell us what your behavior was though because it's impossible to say what's going on for you otherwise.
First off, here was my program input.
3
hello
world
cat
And the output...
cat
char str[20] is a memory address, and that address is being reused by the compiler. Let's say that memory address is 0xABCD.
So on the first iteration, the map contains one element which is { 0xABCD, 1 }. On the second iteration it contains the same element with its value incremented, {0xABCD, 2}. On the third iteration it contains {0xABCD, 3}. Then when you go to print the map, it finds only one element in the map, and prints that memory address. This memory address happens to contain the word "cat", so it prints cat.
But this behavior is not reliable. The array char str[20] doesn't exist outside of the for loop, so sticking it into map <char *, int> cnt and even worse printing the array outside the loop are both undefined behavior.
If you want your code to work, I suppose you could do this....
for ( int i = 1; i <= n; i++){
char * str = new char[20];
gets(str);
cnt[str]++;
}
for ( auto i = cnt.begin(); i != cnt.end(); i++ )
puts(i->first);
for ( auto i = cnt.begin(); i != cnt.end(); i++ )
delete[](i->first);
But really, the correct strategy here is to either....
1) Use std::string
or
2) Don't use std::map
If you want to use C strings beyond converting them to std::string, then program without the use of the C++ std library. Stick to the C standard library.

Seems like cnt is std::map<char*, ...>. When you do cnt[str] you use pointer to local variable str as key, but str is only valid during single iteration. After that str is freed (semantically, optimizer may reuse it, but it is irrelevant here) and pointer to it is no longer valid.

It's very simple: when you allocate a C-style array as a local variable (char str[20];), it is allocated on the stack. It behaves just like any other object that you allocate as a local variable. And when it falls out of scope, it will be destroyed.
When you try to pass the array to the map in cnt[str], the array name decays to a pointer to the first element (it implicitely converts an expression of type char[20] into an expression of type char*). This is something radically different than an array. The map only ever sees this single pointer and stores it as the key. The map does not dereference the pointer to find out what's behind it, it just uses the memory location.
To fix your code, you need to do two things:
You need to allocate memory for your strings on the heap, so that the char* remains valid after the end of the scope. The easiest way to do this is to use the getline() or getdelim() functions available in the POSIX-2008 standard: These beautiful two functions will actually do the malloc() call for you. However, you still need to remember to free the string afterwards.
Making the map aware that you are talking about strings and not about memory addresses is much harder to achieve. If you must use a map, you likely need to define your own std::string-like wrapper class. But I guess, since you are playing around with the char* to learn their use, it would be more prudent to use some other kind of list and program the logic to check whether the given string is already in the list. Could be an array of char*, probably sorted to save lookup time, or a linked list, or whatever you like. For ease, you can just use an std::vector<char*>, but don't forget to free your strings before letting the vector fall out of scope.

Is this legal in C++?

I created a C++/CLI wrapper calling a third party code, which happened to end in corrupted memory. So I'm suspecting that maybe the code wasn't legal in C++
below is the code that crashed:
void Init_4bit_tab(unsigned char *dest,unsigned char *source)
{
unsigned char masque,i;
masque=0x08;
for(i=0; i<4; i++) {
dest[i] = (*source & masque)>>(3-i);
masque >>= 1;
}
}
the exact error was:
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Update:
After scanning the 3rd party code, It appears to be multidimensional array, because of the way it was passed, but I'm still not sure what's causing the problem:
the source function
unsigned char Data_B[81];
...
S_Box_Calc(&Data_B[33]);
void S_Box_Calc(unsigned char *vect)
{
unsigned char *S_Box[8];
unsigned lig,col,i;
S_Box[0]=S1;
S_Box[1]=S2;
S_Box[2]=S3;
S_Box[3]=S4;
S_Box[4]=S5;
S_Box[5]=S6;
S_Box[6]=S7;
S_Box[7]=S8;
for(i=0;i<8;i++) {
col= 8*vect[1+6*i] + 4*vect[2+6*i] + 2*vect[3+6*i] + vect[4+6*i];
lig= 2*vect[6*i] + vect[5+6*i];
Init_4bit_tab(&vect[4*i],&S_Box[i][col+lig*16]);
}
}
Update 2:
I checked the values on debug mode the dest and source are not null. however if I tried to quick watch (*source & masque) under this code dest[i] = (*source & masque)>>(3-i);
I get this error
(*source & masque) error: & cannot be performed on '*source' and 'masque'
Update 3:
S1...Sn was originally defined on the global scope of the file, but I get an error when I left it as is, so I initialized them in the constructor this way:
unsigned char lS1[64] = {
14,4,13,1,2,15,11,8,3,10,6,12,5,9,0,7,
0,15,7,4,14,2,13,1,10,6,12,11,9,5,3,8,
4,1,14,8,13,6,2,11,15,12,9,7,3,10,5,0,
15,12,8,2,4,9,1,7,5,11,3,14,10,0,6,13
};
std::copy(S1, S1 + 64, lS1);
could this be the problem?

There's no problem with the code you show, if it is passed
valid pointers. If it's corrupting memory, it's probably
because the caller didn't pass it valid pointers.
After your edit: if S_Box_Calc is called with vect equal to
Data_B + 33, as you show, the range [vect, vect+48) is
legal, which means that Init_4bit_tab should not be called
with a value superior to 44. In fact, in the code you show, it
is never called with a value greater than 28, so you shouldn't be able
to corrupt memory here. If any of S1 through S8, however, do
not point to valid memory, you'll get the symptoms you state.

This is perfectly legal C++ syntactically, it compiles. Check if it's semantically correct. The only place where you could inadvertently tread in the UB land is in accessing dest pointer i.e. The array it is pointing to should atleast be 4 chars long from where it's pointing to. Also since the error talks about access violation, make sure dest is pointing to a writable memory location.

Where are your overflow checks? You should be passing in sizes to your functions to allow you to restrict writes to memory if there is a chance that it will overflow. It's similar to strcpy() vs. strncpy() or strlcpy() in BSD. Perhaps if you implement something along those lines and generate an error where there is a condition that memory written would otherwise overflow, you might find the cause of memory corruption.

Stack overflow for string in C++?

I made a small program that looked like this:
void foo () {
char *str = "+++"; // length of str = 3 bytes
char buffer[1];
strcpy (buffer, str);
cout << buffer;
}
int main () {
foo ();
}
I was expecting that a stack overflow exception would appear because the buffer had smaller size than the str but it printed out +++ successfully... Can someone please explain why would this happened ?
Thank you very much.

Undefined Behavior(UB) happened and you were unlucky it did not crash.
Writing beyond the bounds of allocated memory is Undefined Behavior and UB does not warrant a crash. Anything might happen.
Undefined behavior means that the behavior cannot be defined.

You don't get a stack overflow because it's undefined behaviour, which means anything can happen.
Many compilers today have special flags that tell them to insert code to check some stack problems, but you often need to explicitly tell the compiler to enable that.

Undefined behavior...
In case you actually care about why there's a good chance of getting a "correct" result in this case: there are a couple of contributing factors. Variables with auto storage class (i.e., normal, local variables) will typically be allocated on the stack. In a typical case, all items on the stack will be a multiple of some specific size, most often int -- for example, on a typical 32-bit system, the smallest item you can allocate on the stack will be 32 bits. In other words, on your typical 32-bit system, room for four bytes (of four chars, if you prefer that term).
Now, as it happens, your source string contained only 3 characters, plus the NUL terminator, for a total of 4 characters. By pure bad chance, that just happened to be short enough to fit into the space the compiler was (sort of) forced to allocate for buffer, even though you told it to allocate less.
If, however, you'd copied a longer string to the target (possibly even just a single byte/char longer) chances of major problems would go up substantially (though in 64-bit software, you'd probably need longer still).
There is one other point to consider as well: depending on the system and the direction the stack grows, you might be able to write well the end of the space you allocated, and still have things appear to work. You've allocated buffer in main. The only other thing defined in main is str, but it's just a string literal -- so chances are that no space is actually allocated to store the address of the string literal. You end up with the string literal itself allocated statically (not on the stack) and its address substituted where you've used str. Therefore, if you write past the end of buffer, you may be just writing into whatever space is left at the top of the stack. In a typical case, the stack will be allocated one page at a time. On most systems, a page is 4K or 8K in size, so for a random amount of space used on the stack, you can expect an average of 2K or 4K free respectively.
In reality, since this is in main and nothing else has been called, you can expect the stack to be almost empty, so chances are that there's close to a full page of unused space at the top of the stack, so copying the string into the destination might appear to work until/unless the source string was quite long (e.g., several kilobytes).
As to why it will often fail much sooner than that though: in a typical case, the stack grows downward, but the addresses used by buffer[n] will grow upward. In a typical case, the next item on the stack "above" buffer will be the return address from main to the startup code that called main -- therefore, as soon as you write past the amount of space on the stack for buffer (which, as above, is likely to be larger than you specified) you'll end up overwriting the return address from main. In that case, the code inside main will often appear to work fine, but as soon as execution (tries to) return from main, it'll end up using that data you just wrote as the return address, at which point you're a lot more likely to see visible problems.

Outlining what happens:
Either you are lucky and it crashes at once. Or because it's undefined technically you could end up writing to a memory address used by something else. say that you had two buffers, one buffer[1] and one longbuffer[100] and assume that the memory address at buffer[2] could be the same as longbuffer[0] which would mean that long buffer now terminates at longbuffer[1] (because the null-termination).
char *s = "+++";
char longbuffer[100] = "lorem ipsum dolor sith ameth";
char buffer[1];
strcpy (buffer, str);
/*
buffer[0] = +
buffer[1] = +
buffer[2] = longbuffer[0] = +
buffer[3] = longbuffer[0] = \0 <- since assigning s will null terminate (i.e. add a \0)
*/
std::cout << longbuffer; // will output: +
Hope that helps in clarifying please note it's not very likely that these memory addresses will be the same in the random case, but it could happen, and it doesn't even need to be the same type, anything can be at buffer[2] and buffer[3] addresses before being overwritten by the assignment. Then the next time you try to use your (now destroyed) variable it might well crash, and thats when debugging become a bit tedious since the crash doesn't seem to have much to do with the real problem. (i.e. it crashes when you try to access a variable on your stack while the real problem is that you somewhere else in your code destroyed it).

There is no explicit bounds checking, or exception throwing on strcpy - it's a C function. If you want to use C functions in C++, you're going to have to take on the responsibility of checking for bounds etc. or switch to using std::string.
In this case it did work, but in a critical system, taking this approach might mean that your unit tests pass but in production, your code barfs - not a situation that you want.

Stack corruption is happening, its an undefined behaviour, luckily crash didnt occur. Do the below modifications in your program and run it will crash surely because of stack corruption.
void foo () {
char *str = "+++"; // length of str = 3 bytes
int a = 10;
int *p = NULL;
char buffer[1];
int *q = NULL;
int b = 20;
p = &a;
q = &b;
cout << *p;
cout << *q;
//strcpy (buffer, str);
//Now uncomment the strcpy it will surely crash in any one of the below cout statment.
cout << *p;
cout << *q;
cout << buffer;
}

Simple examples with 2 dimensional wchar_t string array

Can somebody explain why next code output 26 timez 'Z' instead range from 'A' to 'Z', and how can I output this array correct. Look at code:
wchar_t *allDrvs[26];
int count = 0;
for (int n=0; n<26; n++)
{
wchar_t t[] = {L'A' + n, '\0'};
allDrvs[n] = t;
count++;
}
int j;
for(j = 0; j < count; j++)
{
std::wcout << allDrvs[j] << std::endl;
}

The problem (at least one) is:
{
wchar_t t[] = {L'A' + n, '\0'};
allDrvs[n] = t; //allDrvs points to t
count++;
} //t is deallocated here
//allDrvs[n] is a dangling pointer
So, short answer - undefined behavior on the line std::wcout << allDrvs[j].
To get a correct output - there's a crappy ugly version involving dynamic allocation and copying between arrays.
Then there's the correct version of using a std::vector<std::wstring> >.

Your t[] is on the stack; it only exists for one iteration of the loop at a time, and the next iteration appears to be reusing that space - not a behaviour that's required, but this seems to be what's happening based on your results. If you examine allDrvs[] with a debugger after the first loop completes, you'll probably see all the pointers point to the same memory location.
There's a variety of ways you could solve this. You can allocate a new t on the heap for each loop iteration (and delete them afterwards). You could do wchar_t allDrvs[26][2]; instead of wchar_t *allDrvs[26], and copy the contents of t over each iteration. You could display t right away in the first loop, instead of doing it later. You could use std::vector and std::wstring to manage things for you, instead of using arrays and pointers.

Your code has undefined behavior. Your t has automatic storage duration, so as soon as you exit the upper loop, it ceases to exist. Your allDrvs contains 26 pointers to objects that have been destroyed by the time you use them in the second loop.
As it happens, it looks like (under the circumstances you're running it, with the compiler you're using, etc.) what's happening is that it's re-using the same storage space for t at ever iteration of the loop, and when you use allDrvs in the second loop, that storage hasn't been overwritten, so you have 26 pointers to the same data.
Since you're using C++ anyway, I'd advise using std::wstring and probably std::vector instead -- for example, something on this general order:
std::vector<std::wstring> allDrvs;
for (char i=L'A'; i<L'Z'; i++)
allDrvs.push_back(std::wstring(i));
Technically, this isn't entirely portable -- it depends on 'A' .. 'Z' being contiguous, which isn't true with all character sets, IBM's EBCDIC being the obvious exception. Even in that case, it'll produce all the right outputs, but it'll also include a few additional items you didn't really want.
Nonetheless, the original depended on 'A'..'Z' being contiguous, and the code looks like it's probably intended for Windows anyway, so that's probably not really a big concern.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js