For-Loop on Pointer: Does it move the whole address range? - c++

I have this bit of code:
int count_x(char* p, char x)
{
if (p == nullptr) return 0;
int count = 0;
for (; *p != 0; p++)
{
if (*p == x)
count++;
}
return count;
}
The input is in this case a char-array:
char v[] = { "Ich habe einen Beispielsatz erstellt!"};
Since I am currently looking into CPP with the book "C++ the programming language - 4th Edition" I got the code from there and am currently trying to figure it out.
When stepping through it, I noticed that the for loop moves the memory address in increments of one. This is not very surprising to me, but the following question arose and I couldn't find an answer yet:
Does this loop reduce the overall memory range or is the whole range being moved?
Since to my knowlege you use a "block" in whole for storing such a char-array (or any type of array), I guess it is the later since I don't see anything reducing the boundries.
But with that "knowledge" I have to ask: Doesn't this cause major issues as it would be theoretically possible to read parts of the memory the programm shouldn't have access to?
Will this be something I have to keep in mind when dealing with (very) long arrays?

You are not moving anything at all. That's where your confusion comes from. Your code is perfectly safe for very very long strings, don't worry. (Apart that count may overflow...).
You are right that p is incremented in every iteration of the loop, but that doesn't mean that anything is being moved. You are only modifying the value of the pointer p (and count). That's it. Effectively, you are traversing your RAM.
You are right however that you might read in memory that you don't own, but that is the callers fault because count_x's preconditions require that you pass in a null-terminated string, and if you don't, well, you get undefined behavior for accessing memory you don't own. That's why you should use std::string instead of char*, which is guaranteed to be null-terminated (if you use C++11 or higher).

In C and C++, "strings like this" are implicitly nul-terminated. That means they end with a char whose value is 0 or '\0' (same thing).
So this loop:
for (; *p != 0; p++)
advances p until it reaches a point where *p is 0 -- the end of the string.
If p does not point within a nul-terminated buffer or string, the loop will indeed move over memory beyond the end of the memory buffer it started in. This kind of error is common and relying on strings being nul-terminated results (indirectly) in a lot of buffer overruns, security holes, and generally crashing and memory corrupting programs.
To get around this, C++ offers alternative ways to store and interact with strings of characters, including std::string. These do not rely on the properly positioned nul terminator to work, although much C-style code they interact with may.
And in C++17, string view provides a non-owning low-cost way to refer to a bounded size string with no nul terminator.

No problem with (very)long array. Because count_x(), you pass the address of the first char of array. So no overhead for any length of array. You can you the index with point to traverse on all the array. Look like
unsigned int count_x(char* p, char x)
{
if (p == nullptr) return 0;
unsigned int count = 0;
for (unsigned int i=0; i<strlen(p); i++)
{
if (p[i] == x)
count++;
}
return count;
}

Related

char* issue in C++ [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Why this fragment of code does not work? I know that all entered strings have length less than 20 symbols.I do not use std::string because I want to learn how to use char*
#include <map>
#include <stdio.h>
using namespace std;
map <char*,int> cnt;
int main()
{
char ans[20];
int n,mx = 0;
scanf("%d\n",&n);
for ( int i = 1; i <= n; i++){
char str[20];
gets(str);
cnt[str]++;
}
for ( auto i = cnt.begin(); i != cnt.end(); i++ )
puts(i->first);
}
Let's be clear that your code has a lot of undefined behavior. I tried running your code and here is what I saw on my machine. You should tell us what your behavior was though because it's impossible to say what's going on for you otherwise.
First off, here was my program input.
3
hello
world
cat
And the output...
cat
char str[20] is a memory address, and that address is being reused by the compiler. Let's say that memory address is 0xABCD.
So on the first iteration, the map contains one element which is { 0xABCD, 1 }. On the second iteration it contains the same element with its value incremented, {0xABCD, 2}. On the third iteration it contains {0xABCD, 3}. Then when you go to print the map, it finds only one element in the map, and prints that memory address. This memory address happens to contain the word "cat", so it prints cat.
But this behavior is not reliable. The array char str[20] doesn't exist outside of the for loop, so sticking it into map <char *, int> cnt and even worse printing the array outside the loop are both undefined behavior.
If you want your code to work, I suppose you could do this....
for ( int i = 1; i <= n; i++){
char * str = new char[20];
gets(str);
cnt[str]++;
}
for ( auto i = cnt.begin(); i != cnt.end(); i++ )
puts(i->first);
for ( auto i = cnt.begin(); i != cnt.end(); i++ )
delete[](i->first);
But really, the correct strategy here is to either....
1) Use std::string
or
2) Don't use std::map
If you want to use C strings beyond converting them to std::string, then program without the use of the C++ std library. Stick to the C standard library.
Seems like cnt is std::map<char*, ...>. When you do cnt[str] you use pointer to local variable str as key, but str is only valid during single iteration. After that str is freed (semantically, optimizer may reuse it, but it is irrelevant here) and pointer to it is no longer valid.
It's very simple: when you allocate a C-style array as a local variable (char str[20];), it is allocated on the stack. It behaves just like any other object that you allocate as a local variable. And when it falls out of scope, it will be destroyed.
When you try to pass the array to the map in cnt[str], the array name decays to a pointer to the first element (it implicitely converts an expression of type char[20] into an expression of type char*). This is something radically different than an array. The map only ever sees this single pointer and stores it as the key. The map does not dereference the pointer to find out what's behind it, it just uses the memory location.
To fix your code, you need to do two things:
You need to allocate memory for your strings on the heap, so that the char* remains valid after the end of the scope. The easiest way to do this is to use the getline() or getdelim() functions available in the POSIX-2008 standard: These beautiful two functions will actually do the malloc() call for you. However, you still need to remember to free the string afterwards.
Making the map aware that you are talking about strings and not about memory addresses is much harder to achieve. If you must use a map, you likely need to define your own std::string-like wrapper class. But I guess, since you are playing around with the char* to learn their use, it would be more prudent to use some other kind of list and program the logic to check whether the given string is already in the list. Could be an array of char*, probably sorted to save lookup time, or a linked list, or whatever you like. For ease, you can just use an std::vector<char*>, but don't forget to free your strings before letting the vector fall out of scope.

What are the ramifications of simply checking a pointers value in a conditional statement?

Here goes my code:
#include <iostream>
using namespace std;
int countX(char*, char);
int main() {
char msg[] = "there are four a's in this sentence a a";
//char *ptr = msg; // <----- question 2!
cout << countX(msg, 'a');
cin.get();
}
int countX(char* ptr, char x) {
int c = 0;
for (; *ptr; ptr++) {
if (*ptr == x) c++;
}
/*
while(*ptr) {
if(*ptr==x) c++;
ptr++;
}
*/
return c;
}
I was wondering a few things specifically regarding safe practice and pointers:
My conditional statement in the for-loop ; *ptr ;, is this safe practice? Will it every break if there happens to be something stored in the memory address right next to the last element in the array? Is that even possible? How does it know when to terminate? When is *ptr deemed unacceptable?
(concerning the commented out char *ptr = msg; in the main): I understand a pointer and an array are very similar, however, is there a difference between passing the actual array to countX vs. passing a pointer (which points to the beginning of the array?).
In countX I've provided two different ways to approach the simple problem. Is one considered superior over the other?
Q My conditional statement in the for-loop ; *ptr ;, is this safe practice?
A Yes, most of the time. See below for more details.
Q Will it every (I know you meant ever) break if there happens to be something stored in the memory address right next to the last element in the array?
A Yes.
Q Is that even possible?
A Yes. You can easily access the memory one past the last character of the array and make it something other than the null character.
Q How does it know when to terminate?
A It will terminate when you encounter the terminating null character of a string. If the null character has been replaced by something else, the behavior is going to be unpredictable.
Q When is *ptr deemed unacceptable?
A If the string length is len, it is OK to set ptr in the range msg and msg+len. If ptr points to anything beyond that range, the behavior is undefined. Hence, they should be considered unacceptable in a program.
Q (concerning the commented out char *ptr = msg; in the main): I understand a pointer and an array are very similar, however, is there a difference between passing the actual array to countX vs. passing a pointer (which points to the beginning of the array?).
A No. They are identical.
Q In countX I've provided two different ways to approach the simple problem. Is one considered superior over the other?
A No they are not. It comes down to personal taste. I happen to like to use for loops while I know people that like to use while loops.
Q1 : My conditional statement in the for-loop ; *ptr ;, is this safe practice? Will it every break if there happens to be something stored in the memory address right next to the last element in the array? Is that even possible? How does it know when to terminate? When is *ptr deemed unacceptable?
Ans : When used with c-style strings, yes it is a safe practice. Any c-style string necessarily ends with '\0', which is basically 0. The behavior is undefined when the '\0' is not there. So the loop would break at the end of the string. *ptr would never terminate if it is anything other than a c-style string. For example, a c-style "hello" is actually an array containing 'h', 'e', 'l', 'l', 'o', '\0'. So, the loop exists at '\0', never accessing the memory after it.
it is possible to access the memory after the last element of an array. For example,
int a[5] = {0,1,2,3,4,5};
int *p = a+5;
p is accessing the element after the last element of the array a.
Q2 :(concerning the commented out char *ptr = msg; in the main): I understand a pointer and an array are very similar, however, is there a difference between passing the actual array to countX vs. passing a pointer (which points to the beginning of the array?).
Ans : Arrays and pointers are not exactly similar. Its just that an array name is nothing but a constant pointer pointing to the first element of the array. Consider the previous example i wrote. In that, a[3], 3[a], *(a+3) and *(p+3), all refer to the same element. Since you are passing by value, the value of the constant pointer msg would just be copied to ptr. So, no, it would make no difference.
Q3 : In countX I've provided two different ways to approach the simple problem. Is one considered superior over the other?
Ans : I am not an expert, but i'd say no.
Also, you probably dont need the cin.get().
This is very bad practice.
What you're doing in the for condition is basically if (*ptr), in other words, does the memory pointed to by ptr contain a non-zero value?
So if the memory location after the string contains a non-zero value (maybe from another variable using the space) or a garbage value then your loop could go infinite, or give you an incorrect value. Instead you should run the loop from 0 to the length of your string.

100% of array correct in function, 75% of array correct in CALLING function - C

Note: i'm using the c++ compiler, hence why I can use pass by reference
i have a strange problem, and I don't really know what's going on.
Basically, I have a text file: http://pastebin.com/mCp6K3HB
and I'm reading the contents of the text file in to an array of atoms:
typedef struct{
char * name;
char * symbol;
int atomic_number;
double atomic_weight;
int electrons;
int neutrons;
int protons;
} atom;
this is my type definition of atom.
void set_up_temp(atom (&element_record)[DIM1])
{
char temp_array[826][20];
char temp2[128][20];
int i=0;
int j=0;
int ctr=0;
FILE *f=fopen("atoms.txt","r");
for (i = 0; f && !feof(f) && i < 827; i++ )
{
fgets(temp_array[i],sizeof(temp_array[0]),f);
}
for (j = 0; j < 128; j++)
{
element_record[j].name = temp_array[ctr];
element_record[j].symbol = temp_array[ctr+1];
element_record[j].atomic_number = atol(temp_array[ctr+2]);
element_record[j].atomic_weight = atol(temp_array[ctr+3]);
element_record[j].electrons = atol(temp_array[ctr+4]);
element_record[j].neutrons = atol(temp_array[ctr+5]);
element_record[j].protons = atol(temp_array[ctr+6]);
ctr = ctr + 7;
}
//Close the file to free up memory and prevent leaks
fclose(f);
} //AT THIS POINT THE DATA IS FINE
Here is the function I'm using to read the data. When i debug this function, and let it run right up to the end, I use the debugger to check it's contents, and the array has 100% correct data, that is, all elements are what they should be relative to the text file.
http://i.imgur.com/SEq9w7Q.png This image shows what I'm talking about. On the left, all the elements, 0, up to 127, are perfect.
Then, I go down to the function I'm calling it from.
atom myAtoms[118];
set_up_temp(myAtoms); //AT THIS POINT DATA IS FINE
region current_button_pressed; // NOW IT'S BROKEN
load_font_named("arial", "cour.ttf", 20);
panel p1 = load_panel("atomicpanel.txt");
panel p2 = load_panel("NumberPanel.txt");
As soon as ANYTHING is called, after i call set_up_temp, the elements 103 to 127 of my array turn in to jibberish. As more things get called, EVEN MORE of the array turns to jibberish. This is weird, I don't know what's happening... Does anyone have any idea? Thanks.
for (j = 0; j < 128; j++)
{
element_record[j].name = temp_array[ctr];
You are storing, and then returning, pointers into temp_array, which is on the stack. The moment you return from the function, all of temp_array becomes invalid -- it's undefined behavior to dereference any of those pointers after that point. "Undefined behavior" includes the possibility that you can still read elements 0 through 102 with no trouble, but 103 through 127 turn to gibberish, as you say. You need to allocate space for these strings that will live as long as the atom object. Since as you say you are using C++, the easiest fix is to change both char * members to std::string. (If you don't want to use std::string, the second easiest fix is to use strdup, but then you have to free that memory explicitly.)
This may not be the only bug in this code, but it's probably the one causing your immediate problem.
In case you're curious, the reason the high end of the data is getting corrupted is that on most (but not all) computers, including the one you're using, the stack grows downward, i.e. from high addresses to low. Arrays, however, always index from low addresses to high. So the high end of the memory area that used to be temp_array is the part that's closest to the stack pointer in the caller, and thus most likely to be overwritten by subsequent function calls.
Casual inspection yields this:
char temp_array[826][20];
...
for (i = 0; f && !feof(f) && i < 827; i++ )
Your code potentially allows i to become 826. Which means you're accessing the 827th element of temp_array. Which is one past the end. Oops.
Additionally, you are allocating an array of 118 atoms (atom myAtoms[118];) but you are setting 128 of them inside of set_up_temp in the for (j = 0; j < 128; j++) loop.
The moral of this story: Mind your indices and since you use C++ leverage things like std::vector and std::string and avoid playing with arrays directly.
Update
As Zack pointed out, you're returning pointers to stack-allocated variables which will go away when the set_up_temp function returns. Additionally, the fgets you use doesn't do what you think it does and it's HORRIBLE code to begin with. Please read the documentation for fgets and ask yourself what your code does.
You are allocating an array with space for 118 elements but the code sets 128 of them, thus overwriting whatever happens to live right after the array.
Also as other noted you're storing in the array pointers to data that is temporary to the function (a no-no).
My suggestion is to start by reading a good book about C++ before programming because otherwise you're making your life harder for no reason. C++ is not a language in which you can hope to make serious progress by experimentation.

Why Does Array With 1 element Allow 2k elements? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why don’t i get “Segmentation Fault”?
Why does this code work? If the first element only hold the first characer, then where are the rest of characters being stored? And if this is possible, why aren't we using this method?
Notice line 11: static char c[1]. Using one element, you can store as much characters as you want. I use static to keep the memory location alive outside of the function when pointing to it later.
#include <stdio.h>
void PutString( const char* pChar ){
for( ; *pChar != 0; pChar++ )
{
putchar( *pChar );
}
}
char* GetString(){
static char c[1];
int i = 0;
do
{
c[i] = getchar();
}while( c[i++] != '\n' );
c[i] = '\0';
return c;
}
void main(){
PutString( "Enter some text: " );
char* pChar = GetString();
PutString( "You typed the following:\n" );
PutString( pChar );
}
C doesn't check for array boundaries, so no error is thrown. However, characters after the first one will be stored in memory not allocated by the program. If the string is short, this may work, but a long enough string will corrupt enough memory to crash the process.
You can write wherever you want:
char *bad = 0xABCDEF00;
bad[0] = 'A';
But you shouldn't. Who knows what the above lines of code will do? In the very best case, your program will crash. In the worst case, you've corrupted memory and won't find out until much later. (And good luck tracking down the source!)
To answer your specific questions, it doesn't "work". The rest of the characters are stored directly after the array.
You are just very (un)lucky, that you are not overwriting some other data structures. The array definitely cannot store as much characters as you want - sooner or later you either silently corrupt your memory (in the worse case), or hit a segfault by accesing a memory your process hasn't mapped. The fact that it works is likely because the compiler didn't place any other data after your c[1]. Just try to add a second array, let's say static char d[1]; after c, and then try reading from it - you'll see the second character from c.
C++ does not do bounds checking on arrays. That's for performance reasons; checking every array index to see if it's outside the bounds would incur an unacceptable runtime overhead. Avoiding overhead has always been a design goal of C++.
If you want bounds checking, you should use std::vector instead, which does provides it as an optional feature through std::vector::at().
In this case the behavior is undefined: according to the compiler / the current state of the memory / ..., it may seems to run fine, it may write corrupted chars, or it may crash because of a sefault.
Linking against Electric Fence or running in valgrind may help to find such errors at runtime.

Simple examples with 2 dimensional wchar_t string array

Can somebody explain why next code output 26 timez 'Z' instead range from 'A' to 'Z', and how can I output this array correct. Look at code:
wchar_t *allDrvs[26];
int count = 0;
for (int n=0; n<26; n++)
{
wchar_t t[] = {L'A' + n, '\0'};
allDrvs[n] = t;
count++;
}
int j;
for(j = 0; j < count; j++)
{
std::wcout << allDrvs[j] << std::endl;
}
The problem (at least one) is:
{
wchar_t t[] = {L'A' + n, '\0'};
allDrvs[n] = t; //allDrvs points to t
count++;
} //t is deallocated here
//allDrvs[n] is a dangling pointer
So, short answer - undefined behavior on the line std::wcout << allDrvs[j].
To get a correct output - there's a crappy ugly version involving dynamic allocation and copying between arrays.
Then there's the correct version of using a std::vector<std::wstring> >.
Your t[] is on the stack; it only exists for one iteration of the loop at a time, and the next iteration appears to be reusing that space - not a behaviour that's required, but this seems to be what's happening based on your results. If you examine allDrvs[] with a debugger after the first loop completes, you'll probably see all the pointers point to the same memory location.
There's a variety of ways you could solve this. You can allocate a new t on the heap for each loop iteration (and delete them afterwards). You could do wchar_t allDrvs[26][2]; instead of wchar_t *allDrvs[26], and copy the contents of t over each iteration. You could display t right away in the first loop, instead of doing it later. You could use std::vector and std::wstring to manage things for you, instead of using arrays and pointers.
Your code has undefined behavior. Your t has automatic storage duration, so as soon as you exit the upper loop, it ceases to exist. Your allDrvs contains 26 pointers to objects that have been destroyed by the time you use them in the second loop.
As it happens, it looks like (under the circumstances you're running it, with the compiler you're using, etc.) what's happening is that it's re-using the same storage space for t at ever iteration of the loop, and when you use allDrvs in the second loop, that storage hasn't been overwritten, so you have 26 pointers to the same data.
Since you're using C++ anyway, I'd advise using std::wstring and probably std::vector instead -- for example, something on this general order:
std::vector<std::wstring> allDrvs;
for (char i=L'A'; i<L'Z'; i++)
allDrvs.push_back(std::wstring(i));
Technically, this isn't entirely portable -- it depends on 'A' .. 'Z' being contiguous, which isn't true with all character sets, IBM's EBCDIC being the obvious exception. Even in that case, it'll produce all the right outputs, but it'll also include a few additional items you didn't really want.
Nonetheless, the original depended on 'A'..'Z' being contiguous, and the code looks like it's probably intended for Windows anyway, so that's probably not really a big concern.